Normalization Lecture Notes

Normalization

Introduction

  • Normalization is a database design technique to reduce data redundancy and improve data integrity.

Learning Objectives

  • Define normalization.

  • Explain and identify database anomalies.

  • Define and identify functional dependencies.

  • Normalize relations to:

    • 1st Normal Form (1NF)

    • 2nd Normal Form (2NF)

    • 3rd Normal Form (3NF)

Motivation for Normalization

  • Normalization addresses what happens if we don’t normalize.

Problems with Unnormalized Data

  • Consider a table with student information and subject details.

  • Example Table:

    • Student ID#, Student Name, Campus Address, Degree, Phone, Subject ID, Subject Title, Lecturer Name, Lecturer Office, Lecturer Phone, Semester, Grade

  • Issues:

    • Data redundancy: Student information is repeated for each subject a student takes.

    • Update anomalies: Changing a student's address requires updating multiple rows.

    • Insertion anomalies: Difficult to add a new subject without a student enrolled.

    • Deletion anomalies: Deleting a student record may remove subject information.

Anomalies in Denormalized Data

  • Insertion Anomaly: A new course cannot be added until at least one student has enrolled.

  • Deletion Anomaly: If a student withdraws, information about a course and its fee may be lost.

  • Update Anomaly: If a course fee changes, it must be updated in multiple records, leading to potential inconsistencies.

Normalization Definition

  • A relation is normalized if all determinants are candidate keys.

  • Normalization is a technique to remove undesired redundancy by breaking one large table into several smaller tables.

Invoice Example

  • Consider an invoice example in a spreadsheet format.

  • Initial Format:

    • Invoice Number, Date, Customer Name, Customer Address, Sales Person, Terms, Product ID, Product Name, Unit Price, Quantity, Amount, Sub Total

  • This format is not relational.

Steps to Normalize the Invoice Example

  • Break the spreadsheet into two tables:

    • Table 1: Invoice information (Invoice Number, Date, Customer Name, Customer Address, Sales Person, Terms, Sub Total, Discount, Sales Tax, Shipping).

    • Table 2: Product details (Product ID, Product Name, Unit Price, Quantity, Amount).

  • Connect the tables using a foreign key (Invoice Number).

Further Refinement of Invoice Example

  • Separate product-specific information from order-specific information.

  • Create three tables:

    • Invoice (Invoice Number, Date, Customer ID, Sales Person ID, Terms, Sub Total, Discount, Sales Tax, Shipping).

    • InvoiceLineItem (Invoice Number, Product ID, Quantity).

    • Product (Product ID, Product Name, Unit Price).

  • Consider derived attributes like 'Amount' which can be calculated from Quantity and Unit Price.

  • Introduce separate tables for Sales Person and Customer details to avoid redundancy.

Normalized Relations and ER Diagram

  • Customer (CustomerNumber, CustomerName, CustomerAddress).

  • Clerk (ClerkNumber, ClerkName).

  • Product (ProductNumber, ProductDescription).

  • Invoice (InvoiceNumber, Date, CustomerNumber, ClerkNumber).

  • InvoiceLineItem (InvoiceNumber, ProductNumber, UnitPrice, Quantity).

Functional Dependency

  • A functional dependency concerns values of attributes in a relation.

  • A set of attributes X determines another set of attributes Y if each value of X is associated with only one value of Y.

    • Written as XYX \rightarrow Y

Functional Dependency Definitions

  • Determinants: The attribute(s) on the left-hand side of the arrow.

  • Key and Non-Key attributes: Attributes are either part of the primary key or not.

  • Partial functional dependency: A non-key attribute is functionally dependent on part of the primary key.

  • Transitive dependency: A functional dependency between two or more non-key attributes.

Armstrong’s Axioms

  • Functional dependencies can be identified using Armstrong’s Axioms:

    1. Reflexivity: If BAB \subseteq A, then ABA \rightarrow B

      • Example: Student_ID, name -> name

    2. Augmentation: If ABA \rightarrow B, then ACBCAC \rightarrow BC

      • Example: StudentID -> name => StudentID, surname -> name, surname

    3. Transitivity: If ABA \rightarrow B and BCB \rightarrow C, then ACA \rightarrow C

      • Example: ID -> birthdate and birthdate -> age then ID -> age

  • Given A=X<em>1,X</em>2,,X<em>nA = {X<em>1, X</em>2, …, X<em>n} and B=(Y</em>1,Y<em>2,,Y</em>n)B = (Y</em>1, Y<em>2, …, Y</em>n)

Steps in Normalization

  • First Normal Form (1NF): Remove repeating groups.

  • Second Normal Form (2NF): Remove partial dependencies.

  • Third Normal Form (3NF): Remove transitive dependencies.

First Normal Form

  • Remove Repeating Groups: Repeating groups of attributes cannot be represented in a flat, two-dimensional table.

  • Remove cells with multiple values (keep atomic data).

  • Break them into two and use PK/FK to connect

  • Example:

    • From Order-Item (Order#, Customer#, (Item#, Desc, Qty)) to

    • Order-Item (Order#, Item#, Desc, Qty) and Order (Order#, Customer#)

Second Normal Form

  • Remove Partial Dependencies: A non-key attribute cannot be identified by part of a composite key.

  • Example:

    • From Order-Item (Order#, Item#, Desc, Qty) to

    • Order-Item (Order#, Item#, Qty) and Item (Item#, Desc)

Partial Dependency Anomalies

  • Example Table: Order-Item (Order#, Item#, Desc, Qty)

  • Anomalies include:

    • Update Anomaly: Changing item description requires updating multiple rows.

    • Deletion Anomaly: Data for the last item is lost when the last order for that item is deleted.

    • Insertion Anomaly: Cannot add a new item until it is ordered.

Solution to these Anomalies

  • Split the table into two:

    • Order-Item (Order#, Item#, Qty)

    • Item (Item#, Desc)

  • This resolves the update, delete, and insert anomalies.

Third Normal Form

  • Remove Transitive Dependencies: A non-key attribute cannot be identified by another non-key attribute.

  • Example:

    • From Employee (Emp#, Ename, Dept#, Dname) to

    • Employee (Emp#, Ename, Dept#) and Department (Dept#, Dname)

Transitive Dependency Anomalies

  • Example Table: Employee (Emp#, Ename, Dept#, Dname)

  • Anomalies include:

    • Update Anomaly: Changing department name requires updating multiple rows.

    • Deletion Anomaly: Data for a department is lost when the last employee for that department is deleted.

    • Insertion Anomaly: Cannot add a new department until an employee is allocated to it.

Solution to these Anomalies

  • Split the table into two:

    • Employee (Emp#, Ename, Dept#)

    • Department (Dept#, Dname)

  • This resolves the update, delete, and insert anomalies.

Summary of Normalization

  • First Normal Form: Remove repeating groups.

  • Second Normal Form: Remove partial dependencies.

  • Third Normal Form: Remove transitive dependencies.

Normalization vs Denormalization

  • Normalization:

    • Reduces redundancy and allows users to insert, modify, and delete rows without errors or inconsistencies (anomalies).

  • Denormalization:

    • Increases query speed but requires extra work to maintain data consistency during updates.

    • May be used to improve the performance of time-critical operations.

Examinable Content

  • Normalization Process

  • Anomalies

  • Functional dependencies