AM

Normalization in Relational Databases

Understanding Normalization

Introduction to Normalization

  • Normalization is used in relational database systems to design the table structure.
  • It involves applying certain concepts and principles to data storage.
  • The lesson covers the first three forms of normalization.
  • Normalization organizes data in a database, following specific rules.
  • It results in the creation of multiple tables and the establishment of relationships between them.

Driving Factors Behind Normalization

  • Reduction in Data Redundancy: Avoid storing the same information in multiple locations within the database (even the same table).
  • Wasted Space: Redundancy wastes disk space and degrades disk performance.
  • Maintenance Problems: When data changes (e.g., an employee's address), you need to update every instance of that data, which is time-consuming and error-prone.
  • Disk reads become slower if you don't do regular disc maintenance for defragmentation, which impacts the performance of your database queries.

Normal Forms

  • Database administrators typically aim for the first three normal forms.
  • Fourth and fifth forms exist but are less commonly implemented because they require extra work and don't significantly impact database functionality.
  • Each form has rules to validate its implementation.

First Normal Form (1NF)

  • Rule: No repeating groups in an individual table.
  • Have a separate table for each set of related data.
  • Each set of related data is identified with a key.
  • Example: Instead of including sales items (item name, price, quantity) in a customer table, create a separate sales table and establish a relationship between the customer and sales tables.
    • This avoids repeated data across multiple sales for the same customer.

Second Normal Form (2NF)

  • An entity is in 2NF if all of its attributes depend on the whole primary key.
  • The values in the columns have a dependency on the other columns.
  • Example: In a customer table, store customer-related data such as first name, last name, address, city, state, and zip code.
    • All of these attributes depend on the primary key (e.g., customer number).

Third Normal Form (3NF)

  • Rule: Eliminate fields in a table that do not depend on the key.
  • Example: The state or country of a person does not depend on the key that uniquely identifies the person.
    • Store states in a state table and countries in a country table, then relate these tables together.
    • Avoids duplicate entries for states and countries in the customer table for customers in the same locations.

Considerations with Normalization

  • Normalization can make querying more difficult because it involves joining multiple tables.
  • Need to focus on how to join information from one table to another to retrieve the desired data.
  • Data warehousing scenarios may involve de-normalizing data for more efficient querying.
  • De-normalized data is often more efficient to query than normalized data.

Conclusion

  • Understanding normalization is essential for grasping the concepts behind multiple tables and the need to query them effectively.
  • Normalization principles will be further illustrated through examples in subsequent lessons on querying multiple tables.