Normalization Lecture Notes

Normalization

Introduction

Normalization is a database design technique to reduce data redundancy and improve data integrity.

Learning Objectives

Define normalization.
Explain and identify database anomalies.
Define and identify functional dependencies.
Normalize relations to:
- 1st Normal Form (1NF)
- 2nd Normal Form (2NF)
- 3rd Normal Form (3NF)

Motivation for Normalization

Normalization addresses what happens if we don’t normalize.

Problems with Unnormalized Data

Consider a table with student information and subject details.
Example Table:
- Student ID#, Student Name, Campus Address, Degree, Phone, Subject ID, Subject Title, Lecturer Name, Lecturer Office, Lecturer Phone, Semester, Grade
Issues:
- Data redundancy: Student information is repeated for each subject a student takes.
- Update anomalies: Changing a student's address requires updating multiple rows.
- Insertion anomalies: Difficult to add a new subject without a student enrolled.
- Deletion anomalies: Deleting a student record may remove subject information.

Anomalies in Denormalized Data

Insertion Anomaly: A new course cannot be added until at least one student has enrolled.
Deletion Anomaly: If a student withdraws, information about a course and its fee may be lost.
Update Anomaly: If a course fee changes, it must be updated in multiple records, leading to potential inconsistencies.

Normalization Definition

A relation is normalized if all determinants are candidate keys.
Normalization is a technique to remove undesired redundancy by breaking one large table into several smaller tables.

Invoice Example

Consider an invoice example in a spreadsheet format.
Initial Format:
- Invoice Number, Date, Customer Name, Customer Address, Sales Person, Terms, Product ID, Product Name, Unit Price, Quantity, Amount, Sub Total
This format is not relational.

Steps to Normalize the Invoice Example

Break the spreadsheet into two tables:
- Table 1: Invoice information (Invoice Number, Date, Customer Name, Customer Address, Sales Person, Terms, Sub Total, Discount, Sales Tax, Shipping).
- Table 2: Product details (Product ID, Product Name, Unit Price, Quantity, Amount).
Connect the tables using a foreign key (Invoice Number).

Further Refinement of Invoice Example

Separate product-specific information from order-specific information.
Create three tables:
- Invoice (Invoice Number, Date, Customer ID, Sales Person ID, Terms, Sub Total, Discount, Sales Tax, Shipping).
- InvoiceLineItem (Invoice Number, Product ID, Quantity).
- Product (Product ID, Product Name, Unit Price).
Consider derived attributes like 'Amount' which can be calculated from Quantity and Unit Price.
Introduce separate tables for Sales Person and Customer details to avoid redundancy.

Normalized Relations and ER Diagram

Customer (CustomerNumber, CustomerName, CustomerAddress).
Clerk (ClerkNumber, ClerkName).
Product (ProductNumber, ProductDescription).
Invoice (InvoiceNumber, Date, CustomerNumber, ClerkNumber).
InvoiceLineItem (InvoiceNumber, ProductNumber, UnitPrice, Quantity).

Functional Dependency

A functional dependency concerns values of attributes in a relation.
A set of attributes X determines another set of attributes Y if each value of X is associated with only one value of Y.
- Written as $X \rightarrow Y$

Functional Dependency Definitions

Determinants: The attribute(s) on the left-hand side of the arrow.
Key and Non-Key attributes: Attributes are either part of the primary key or not.
Partial functional dependency: A non-key attribute is functionally dependent on part of the primary key.
Transitive dependency: A functional dependency between two or more non-key attributes.

Armstrong’s Axioms

Functional dependencies can be identified using Armstrong’s Axioms:
1. Reflexivity: If $B \subseteq A$ , then $A \rightarrow B$
  - Example: Student_ID, name -> name
2. Augmentation: If $A \rightarrow B$ , then $AC \rightarrow BC$
  - Example: StudentID -> name => StudentID, surname -> name, surname
3. Transitivity: If $A \rightarrow B$ and $B \rightarrow C$ , then $A \rightarrow C$
  - Example: ID -> birthdate and birthdate -> age then ID -> age
Given $A = {X1, X2, …, Xn}$ and $B = (Y1, Y2, …, Yn)$

Steps in Normalization

First Normal Form (1NF): Remove repeating groups.
Second Normal Form (2NF): Remove partial dependencies.
Third Normal Form (3NF): Remove transitive dependencies.

First Normal Form

Remove Repeating Groups: Repeating groups of attributes cannot be represented in a flat, two-dimensional table.
Remove cells with multiple values (keep atomic data).
Break them into two and use PK/FK to connect
Example:
- From Order-Item (Order#, Customer#, (Item#, Desc, Qty)) to
- Order-Item (Order#, Item#, Desc, Qty) and Order (Order#, Customer#)

Second Normal Form

Remove Partial Dependencies: A non-key attribute cannot be identified by part of a composite key.
Example:
- From Order-Item (Order#, Item#, Desc, Qty) to
- Order-Item (Order#, Item#, Qty) and Item (Item#, Desc)

Partial Dependency Anomalies

Example Table: Order-Item (Order#, Item#, Desc, Qty)
Anomalies include:
- Update Anomaly: Changing item description requires updating multiple rows.
- Deletion Anomaly: Data for the last item is lost when the last order for that item is deleted.
- Insertion Anomaly: Cannot add a new item until it is ordered.

Solution to these Anomalies

Split the table into two:
- Order-Item (Order#, Item#, Qty)
- Item (Item#, Desc)
This resolves the update, delete, and insert anomalies.

Third Normal Form

Remove Transitive Dependencies: A non-key attribute cannot be identified by another non-key attribute.
Example:
- From Employee (Emp#, Ename, Dept#, Dname) to
- Employee (Emp#, Ename, Dept#) and Department (Dept#, Dname)

Transitive Dependency Anomalies

Example Table: Employee (Emp#, Ename, Dept#, Dname)
Anomalies include:
- Update Anomaly: Changing department name requires updating multiple rows.
- Deletion Anomaly: Data for a department is lost when the last employee for that department is deleted.
- Insertion Anomaly: Cannot add a new department until an employee is allocated to it.

Solution to these Anomalies

Split the table into two:
- Employee (Emp#, Ename, Dept#)
- Department (Dept#, Dname)
This resolves the update, delete, and insert anomalies.

Summary of Normalization

First Normal Form: Remove repeating groups.
Second Normal Form: Remove partial dependencies.
Third Normal Form: Remove transitive dependencies.

Normalization vs Denormalization

Normalization:
- Reduces redundancy and allows users to insert, modify, and delete rows without errors or inconsistencies (anomalies).
Denormalization:
- Increases query speed but requires extra work to maintain data consistency during updates.
- May be used to improve the performance of time-critical operations.

Examinable Content

Normalization Process
Anomalies
Functional dependencies