Normalization

Unit 4: Normalization

4.1. Need of Normalization

  • Consequences of Bad Design:

    • Insert, Update, and Delete anomalies.

    • Database normalization minimizes redundancy and dependencies in data.

    • It involves splitting large tables into smaller ones and defining relationships between them.

4.2. Normalization

  • Definition:

    • A database schema design technique that organizes data to minimize redundancy and dependency.

    • Developed by IBM researcher E.F. Codd in the 1970s.

    • Clarifies organization of data in a database and is achieved through operational rules called 'forms'.

4.2.1. First Normal Form (1NF)
  • Tables must have no repeating groups or arrays.

  • Each cell should contain a single value, and each record must be unique.

  • Example:

    • Employee table shows employees working in multiple departments.

4.2.2. Second Normal Form (2NF)
  • Must meet 1NF requirements.

  • No Partial Dependency; all non-prime attributes must depend on the whole candidate key.

  • Attributes:

    • Prime attribute: Part of the candidate key

    • Non-prime attribute: Not part of the candidate key

4.2.3. Third Normal Form (3NF)
  • Must be in 2NF.

  • No transitive dependency; non-prime attributes should not depend transitively on the primary key.

4.2.4. Boyce-Codd Normal Form (BCNF)
  • Higher version of 3NF.

  • A table is in BCNF if:

    • It is in 3NF.

    • For every functional dependency (X → Y), X is a super key.

Anomalies in DBMS

  • The types of anomalies when the database is not normalized:

    1. Insertion Anomaly: Inability to add data due to constraints. Example: New employee under training without a department.

    2. Update Anomaly: Inconsistent data when updating records across multiple entries. Example: Employee with two records in different departments.

    3. Deletion Anomaly: Losing valuable information when deleting records. Example: Deleting all employees from a closed department.

Advantages of Normalization

  1. Smaller databases due to elimination of duplicate data.

  2. Improved performance; faster data access due to reduced size.

  3. Narrower tables lead to more records being accommodated per page.

  4. Fewer indexes per table expedite maintenance tasks.

  5. Selective table joining allows for querying only necessary data.

Disadvantages of Normalization

  1. Increased complexity due to more tables; harder to comprehend.

  2. Tables may contain codes rather than actual data, necessitating lookup tables.

  3. Difficult to query as the model may not cater to ad hoc queries.

  4. Performance may decline with higher normal form levels due to increased joins.

  5. Requires knowledge of normal forms to apply effectively; poor execution can lead to anomalies and inconsistencies.

Database Normalization Rules

  • First Normal Form (1NF) Rules:

    • Each table cell must contain a single value.

    • Each record needs to be unique.

Employee Table Examples

  1. Employee Table Before 1NF:

    • Contains multiple departments in one cell.

  2. Employee Table Following 1NF:

    • Split into multiple records for each employee-department relation.

Second Normal Form (2NF)
  • A table is in 2NF if:

    1. It is in 1NF.

    2. It has no Partial Dependency.

  • Example: Employee must depend on both prime attributes (Emp_Id, Dept_Id).

Third Normal Form (3NF)
  • Condition for achieving 3NF:

    1. Must fulfill Second Normal Form requirements.

    2. No non-prime attribute is transitively dependent on the prime key attribute.

Transitive Functional Dependencies
  • If a non-key attribute depends on another non-key attribute, it's considered transitive and should be moved to a new table.

Boyce and Codd Normal Form (BCNF)
  • Definition:

    • BCNF is a stricter version of 3NF addressing certain types of anomalies not handled by 3NF.

  • For a table to be in BCNF, it must:

    1. Be in 3NF.

    2. Ensure that for every functional dependency (X → Y), X is a super key.