Normalization in Relational Databases
Understanding Normalization
Introduction to Normalization
- Normalization is used in relational database systems to design the table structure.
- It involves applying certain concepts and principles to data storage.
- The lesson covers the first three forms of normalization.
- Normalization organizes data in a database, following specific rules.
- It results in the creation of multiple tables and the establishment of relationships between them.
Driving Factors Behind Normalization
- Reduction in Data Redundancy: Avoid storing the same information in multiple locations within the database (even the same table).
- Wasted Space: Redundancy wastes disk space and degrades disk performance.
- Maintenance Problems: When data changes (e.g., an employee's address), you need to update every instance of that data, which is time-consuming and error-prone.
- Disk reads become slower if you don't do regular disc maintenance for defragmentation, which impacts the performance of your database queries.
- Database administrators typically aim for the first three normal forms.
- Fourth and fifth forms exist but are less commonly implemented because they require extra work and don't significantly impact database functionality.
- Each form has rules to validate its implementation.
- Rule: No repeating groups in an individual table.
- Have a separate table for each set of related data.
- Each set of related data is identified with a key.
- Example: Instead of including sales items (item name, price, quantity) in a customer table, create a separate sales table and establish a relationship between the customer and sales tables.
- This avoids repeated data across multiple sales for the same customer.
- An entity is in 2NF if all of its attributes depend on the whole primary key.
- The values in the columns have a dependency on the other columns.
- Example: In a customer table, store customer-related data such as first name, last name, address, city, state, and zip code.
- All of these attributes depend on the primary key (e.g., customer number).
- Rule: Eliminate fields in a table that do not depend on the key.
- Example: The state or country of a person does not depend on the key that uniquely identifies the person.
- Store states in a state table and countries in a country table, then relate these tables together.
- Avoids duplicate entries for states and countries in the customer table for customers in the same locations.
Considerations with Normalization
- Normalization can make querying more difficult because it involves joining multiple tables.
- Need to focus on how to join information from one table to another to retrieve the desired data.
- Data warehousing scenarios may involve de-normalizing data for more efficient querying.
- De-normalized data is often more efficient to query than normalized data.
Conclusion
- Understanding normalization is essential for grasping the concepts behind multiple tables and the need to query them effectively.
- Normalization principles will be further illustrated through examples in subsequent lessons on querying multiple tables.