Normalization is organizing data by separating it into multiple tables.
It aims to:
Reduce data duplication.
Increase data integrity.
Instead of storing all sales information for a customer in one table, normalization separates the data.
Data duplication increases database size.
Normalization increases data integrity.
Instead of storing customer state information in every customer record, it's split out into a separate state table.
Tables are linked using foreign key and primary key relationships.
Changing the state in the state table automatically updates it everywhere else via the link.
Normalization Levels
There are five available normalization levels.
First Normal Form (1NF).
Second Normal Form (2NF).
Third Normal Form (3NF).
Fourth Normal Form (4NF).
Fifth Normal Form (5NF).
Most database administrators use up to the Third Normal Form.
Higher levels add complexity to the database design and data storage considerations.
The first three levels effectively reduce data duplication and maintain data integrity.
First Normal Form (1NF)
A table should contain no repeating groups or columns.
Example: Customer table with address one and address two columns. While seemingly repetitive, the data stored is different (street address vs. suite number).
SQL Server Example: Sales.Customer table has specific fields like CustomerID, PersonID, StoreID, TerritoryID, AccountNumber, RowGuid, and ModifiedDate.
Sales.SalesOrderHeader table contains sales information like RevisionNumber, OrderDate, DueDate, ShipDate, and Status.
Storing SalesOrderHeader fields in the Customer table would lead to data repetition.
Instead, the tables are separated and linked through the CustomerID.
Achieving First Normal Form involves iterative data design.
Second Normal Form (2NF)
All attributes in a table must depend on the primary key.
The primary key ensures uniqueness across each row.
Values in different columns depend on each other and the primary key.
Person table example in AdventureWorks database.
BusinessEntityID is the primary key for the person table.
Other fields include PersonType, NameStyle, Title, etc.
All fields in the Person table depend on the BusinessEntityID.
State is not listed in the Person table as it's more related to a StateProvince table.
In Second Normal Form, every field in the table must have a direct relationship and dependency on the primary key.
Third Normal Form (3NF)
No attributes in a table should depend on non-key attributes.
Example: Placing a state attribute directly in a person table violates this rule, as state depends on the address, not the person directly.
Avoid repeating groups of the same thing.
Example: Do not include item name, price, and quantity in a customer table, as those values can be repeated across multiple sales for that customer and are not directly dependent on the customer.
Again, we looked at that Person.Person table as an example. You wouldn't put the state in that person table.
A person does have a state in their address.
Person.Address table breaks down address information with AddressID and StateProvinceID.
Person.AddressType table contains more detailed address information.
Drilling down into these tables reveals how a person's address maps to a state.
Normalization splits data across multiple tables to achieve Third Normal Form.
The AdventureWorks database applies normalization to avoid data duplication.
Data integrity is improved by maintaining data in appropriate, separate tables.
This approach prepares us for multi-table queries.