Database Normalization Flashcards
Understanding Normalization
- Normalization is organizing data by separating it into multiple tables.
- It aims to:
- Reduce data duplication.
- Increase data integrity.
- Instead of storing all sales information for a customer in one table, normalization separates the data.
- Data duplication increases database size.
- Normalization increases data integrity.
- Instead of storing customer state information in every customer record, it's split out into a separate state table.
- Tables are linked using foreign key and primary key relationships.
- Changing the state in the state table automatically updates it everywhere else via the link.
Normalization Levels
- There are five available normalization levels.
- First Normal Form (1NF).
- Second Normal Form (2NF).
- Third Normal Form (3NF).
- Fourth Normal Form (4NF).
- Fifth Normal Form (5NF).
- Most database administrators use up to the Third Normal Form.
- Higher levels add complexity to the database design and data storage considerations.
- The first three levels effectively reduce data duplication and maintain data integrity.
- A table should contain no repeating groups or columns.
- Example: Customer table with address one and address two columns. While seemingly repetitive, the data stored is different (street address vs. suite number).
- SQL Server Example: Sales.Customer table has specific fields like CustomerID, PersonID, StoreID, TerritoryID, AccountNumber, RowGuid, and ModifiedDate.
- Sales.SalesOrderHeader table contains sales information like RevisionNumber, OrderDate, DueDate, ShipDate, and Status.
- Storing SalesOrderHeader fields in the Customer table would lead to data repetition.
- Instead, the tables are separated and linked through the CustomerID.
- Achieving First Normal Form involves iterative data design.
- All attributes in a table must depend on the primary key.
- The primary key ensures uniqueness across each row.
- Values in different columns depend on each other and the primary key.
- Person table example in AdventureWorks database.
- BusinessEntityID is the primary key for the person table.
- Other fields include PersonType, NameStyle, Title, etc.
- All fields in the Person table depend on the BusinessEntityID.
- State is not listed in the Person table as it's more related to a StateProvince table.
- In Second Normal Form, every field in the table must have a direct relationship and dependency on the primary key.
- No attributes in a table should depend on non-key attributes.
- Example: Placing a state attribute directly in a person table violates this rule, as state depends on the address, not the person directly.
- Avoid repeating groups of the same thing.
- Example: Do not include item name, price, and quantity in a customer table, as those values can be repeated across multiple sales for that customer and are not directly dependent on the customer.
- Again, we looked at that Person.Person table as an example. You wouldn't put the state in that person table.
- A person does have a state in their address.
- Person.Address table breaks down address information with AddressID and StateProvinceID.
- Person.AddressType table contains more detailed address information.
- Drilling down into these tables reveals how a person's address maps to a state.
- Normalization splits data across multiple tables to achieve Third Normal Form.
- The AdventureWorks database applies normalization to avoid data duplication.
- Data integrity is improved by maintaining data in appropriate, separate tables.
- This approach prepares us for multi-table queries.