Normalization in Database Design Study Notes

Introduction to Normalization

  • Introduction to the concept of normalization in database logical design.
  • Link to the previous week's content.
    • Review of the seven steps to convert ER diagrams to relational tables.

Review of Last Week's Content

Conversion of Entity-Relationship (ER) Diagrams

  • Regular Entity:

    • Create a table; add all attributes as columns.
    • Entity identifier becomes the primary key.
  • Weak Entity:

    • Create a table with the weak entity.
    • Use identifier of the strong entity along with the weak entity’s partial identifier as a composite primary key.
  • Relationships based on Cardinality:

    • One-to-One Relationship: Include primary key from one side as a foreign key in the other side’s table.
    • Many-to-Many Relationship: Create a new table for this relationship; include primary IDs from both sides as foreign keys.
    • Associative Entity:
      • Create a table; if it has an identifier, use it as a natural primary key.
      • If no identifier, use a combination of primary keys from connected entities as a composite primary key.
    • Ternary/Ennary Relationships: Create a new table for the relationships.
    • Subtype and Supertype: Create separate tables for each subtype and supertype, incorporating the supertype’s primary key as a foreign key in the subtype table.

Structure of Well-Structured Relations

  • Emphasis on starting the database design with an ER diagram prior to table creation.
  • If tables are created without careful design, normalization is necessary to resolve potential redundancies.

Introduction to Data Normalization

Definition of Normalization

  • Normalization is the process of deciding which attributes to group together in relations.
  • Objective: Minimize data redundancy and avoid anomalies.

Understanding Well-Structured Relations

  • A well-structured relation minimizes data redundancy and avoids anomalies, including:
    • Insertion anomaly
    • Deletion anomaly
    • Modification anomaly

Types of Anomalies

1. Insertion Anomaly

  • Occurs when adding a new row leads to data duplication.

2. Deletion Anomaly

  • Deleting a row leads to unintended loss of data.

3. Modification Anomaly

  • Modifying a row necessitates updates to other rows to avoid inconsistency.

Design Principles of Well-Structured Relations

  • General rule of thumb:
    • A table should correspond to at most one entity.

Examples and Case Studies

Employee Course Table Example

  • Description: Details of employees and courses they completed.
    • Question about its relational integrity based on uniqueness of rows.
  • Composite Primary Key: Employee ID and Course Title ensure unique records.
  • Highlighted the need for normalization due to the anomalies presented by this table structure.

Detail on Anomaly Examples

  • Insertion Anomaly:
    • New employee with no completed courses cannot be added.
  • Deletion Anomaly:
    • Deleting an employee removes course information when no other entries would retain it.
  • Modification Anomaly:
    • Updating an employee’s data requires consistency across multiple entries.

Process of Normalization

  • Overview of normalization steps and requirements for each normal form.

First Normal Form (1NF)

  • Achieved by removing multivalued attributes.
  • Every attribute must be atomic; must include a defined primary key.

Second Normal Form (2NF)

  • After achieving 1NF:
    • Remove partial dependencies.
    • A non-key attribute should depend on the entire primary key, not just part of it.

Third Normal Form (3NF)

  • After achieving 2NF:
    • Remove transitive dependencies (where a non-key attribute depends on another non-key attribute).
  • A well-structured database should ideally reach 3NF.

Higher Normal Forms

  • Boyce-Codd Normal Form (BCNF), Fourth (4NF), and Fifth Normal Form (5NF) are discussed but not often applied for the well-structured relations in this course.

Functional Dependencies

Definition

  • A constraint between two attributes where one attribute's value is dependent on another's.
  • Example: Employee ID determines salary.

Candidate Key

  • A potential unique identifier for each row; may have multiple candidate keys but one primary key.

Steps to Achieve Normal Forms

Example Case Study of a Relation into Normal Forms

Initial Table State:
  • Identified primary keys: Composite of Student ID and Course ID.
  • Functional dependencies identified (partial, transitive, full).
Transformation Steps:
  • Steps to break tables into smaller relations to meet 2NF and 3NF requirements.
  • 2NF involves removing partial dependencies.
  • 3NF involves addressing transitive dependencies.

Referencing and Referential Integrity

  • Overview of defining relationships among tables through foreign keys and maintaining data integrity with primary keys.

Issues During Relation Merging

Potential Problems

  • Synonyms: Same concept described by different names in different tables.
  • Homonyms: Same name for different concepts across tables.
  • Transitive Dependencies: Dependencies that must be resolved during merging.
  • Implicit subtype/supertype relationships: Devise structures that recognize such relationships during the merging process.

Types of Keys in Database Design

  • Primary Key: Uniquely identifies a row in a table.
  • Foreign Key: Establishes relationships between tables.
  • Candidate Key: Potential keys for the primary key.
  • Enterprise Key: Unique across the entire database, used to represent all objects of interest.

Conclusion

  • Summary of the day's lessons on normalization:
    • Importance of using normalization to derive well-structured relations.
    • Understanding and applying functional dependencies is critical to reaching normal forms.
    • Recognition of key types and relationship integrity is vital for effective database design.

Assignment and Course Guidance

  • Reminder regarding upcoming assignments and group formation tasks.
  • Installation guide for MySQL server and workbench will be distributed shortly.