AM

Database Normalization Flashcards

Understanding Normalization

  • Normalization is organizing data by separating it into multiple tables.
  • It aims to:
    • Reduce data duplication.
    • Increase data integrity.
  • Instead of storing all sales information for a customer in one table, normalization separates the data.
  • Data duplication increases database size.
  • Normalization increases data integrity.
  • Instead of storing customer state information in every customer record, it's split out into a separate state table.
  • Tables are linked using foreign key and primary key relationships.
  • Changing the state in the state table automatically updates it everywhere else via the link.

Normalization Levels

  • There are five available normalization levels.
  • First Normal Form (1NF).
  • Second Normal Form (2NF).
  • Third Normal Form (3NF).
  • Fourth Normal Form (4NF).
  • Fifth Normal Form (5NF).
  • Most database administrators use up to the Third Normal Form.
  • Higher levels add complexity to the database design and data storage considerations.
  • The first three levels effectively reduce data duplication and maintain data integrity.

First Normal Form (1NF)

  • A table should contain no repeating groups or columns.
  • Example: Customer table with address one and address two columns. While seemingly repetitive, the data stored is different (street address vs. suite number).
  • SQL Server Example: Sales.Customer table has specific fields like CustomerID, PersonID, StoreID, TerritoryID, AccountNumber, RowGuid, and ModifiedDate.
  • Sales.SalesOrderHeader table contains sales information like RevisionNumber, OrderDate, DueDate, ShipDate, and Status.
  • Storing SalesOrderHeader fields in the Customer table would lead to data repetition.
  • Instead, the tables are separated and linked through the CustomerID.
  • Achieving First Normal Form involves iterative data design.

Second Normal Form (2NF)

  • All attributes in a table must depend on the primary key.
  • The primary key ensures uniqueness across each row.
  • Values in different columns depend on each other and the primary key.
  • Person table example in AdventureWorks database.
  • BusinessEntityID is the primary key for the person table.
  • Other fields include PersonType, NameStyle, Title, etc.
  • All fields in the Person table depend on the BusinessEntityID.
  • State is not listed in the Person table as it's more related to a StateProvince table.
  • In Second Normal Form, every field in the table must have a direct relationship and dependency on the primary key.

Third Normal Form (3NF)

  • No attributes in a table should depend on non-key attributes.
  • Example: Placing a state attribute directly in a person table violates this rule, as state depends on the address, not the person directly.
  • Avoid repeating groups of the same thing.
  • Example: Do not include item name, price, and quantity in a customer table, as those values can be repeated across multiple sales for that customer and are not directly dependent on the customer.
  • Again, we looked at that Person.Person table as an example. You wouldn't put the state in that person table.
  • A person does have a state in their address.
  • Person.Address table breaks down address information with AddressID and StateProvinceID.
  • Person.AddressType table contains more detailed address information.
  • Drilling down into these tables reveals how a person's address maps to a state.
  • Normalization splits data across multiple tables to achieve Third Normal Form.
  • The AdventureWorks database applies normalization to avoid data duplication.
  • Data integrity is improved by maintaining data in appropriate, separate tables.
  • This approach prepares us for multi-table queries.