TOPIC 5

Database Systems: Design, Implementation, and Management - Module 6: Normalization of Database Tables

Chapter Objectives

By the end of this chapter, you should be able to:
  - Explain normalization and its role in the database design process.
  - Identify each of the normal forms: 1NF, 2NF, 3NF, BCNF, 4NF, and 5NF.
  - Explain how normal forms can be transformed from lower normal forms to higher normal forms.
  - Apply normalization rules to evaluate and correct table structures.
  - Identify situations that require denormalization to generate information efficiently.
  - Use a data-modeling checklist to check that the ERD meets a set of minimum requirements.

Database Tables and Normalization

Normalization is a process for evaluating and correcting table structures to minimize data redundancies.
Database designers commonly use normalization in the following two situations:
- When designing a new database structure.
- To analyze the relationship among the attributes within each entity and determine if the structure can be improved through normalization.

Normalization and Database Design

Normalization should be part of the design process:
- Proposed entities must meet the required normal form before table structures are created.
Principles and normalization procedures to be understood to redesign and modify databases include the following:
- ERD (Entity-Relationship Diagram) is created through an iterative process.
- Normalization focuses on the characteristics of specific entities.

The Need for Normalization

Eliminates unnecessary or unwanted data redundancies to prevent data anomalies.
Assigns attributes to tables:
- Uses the concept of functional dependencies to identify which attribute determines other attributes.

The Normalization Process

The objective of normalization is to ensure that each table conforms to the concept of well-formed relations and has the following characteristics:
  - Each table represents a single subject.
  - Each row/column intersection contains only one value and not a group of values.
  - No data item will be unnecessarily stored in more than one table.
  - All non-prime attributes in a table are dependent on the primary key.
  - Each table has no insertion, update, or deletion anomalies.
    * An objective is to ensure that all tables are in at least 3NF.
    * Higher forms, such as 5NF and Domain-Key Normal Form (DKNF), are not likely to be encountered in a business environment.

Knowledge Check Activity 6-1

Question: What is normalization?
Answer: Normalization is the process for assigning attributes to entities. Properly executed, the normalization process eliminates uncontrolled data redundancies, thus eliminating the data anomalies and the data integrity problems that are produced by such redundancies.

First Normal Form (1NF)

The term 1NF describes the tabular format in which the following occur:
  - All attributes are dependent on the primary key.
  - Each cell must only contain a single value (not a list).
  - Each value should be non-divisible (can’t be split down further).
  - All rows must be unique (no duplicate rows).

Conversion to First Normal Form (1NF)

The 3 core steps for converting to 1NF are:
  - Identify the primary key → Each record must be uniquely identifiable.
  - Eliminate repeating groups and ensure no multi-valued attributes:
    - Ensure each field contains only atomic (single) values.
    - Move repeating/multiple values into separate rows or tables.
  - Identify all dependencies → Understand relationships between attributes.

Second Normal Form (2NF)

A table is in 2NF under the following circumstances:
  - When it is in 1NF.
  - When it includes no partial dependencies:
    * If the 1NF has a single-attribute primary key (one column, e.g., StudentID), then the table is automatically in 2NF.

Conversion to Second Normal Form (2NF)

The core steps for converting 1NF to 2NF:
  - Make new tables to eliminate partial dependencies.
  - Reassign corresponding dependent attributes.
  - Conversion to 2NF occurs only when the 1NF has a composite primary key.

Conversion to Second Normal Form (2NF) - Example

Composite key: multiple columns combined (e.g., StudentID + CourseID)
- Example: No partial dependencies.

Third Normal Form (3NF)

A table is in 3NF under the following circumstances:
  - When it is in 2NF.
  - When it contains no transitive dependencies:
    * All fields must only be determinable by the primary/composite key, not by other keys.

Conversion to Third Normal Form (3NF)

The core steps for converting to 3NF:
- Make new tables to eliminate transitive dependencies.
- Reassign corresponding dependent attributes.

Improving the Design (1 of 2)

The following are various types of issues you need to address to produce a good normalized set of tables:
  - Minimize data entry errors.
  - Evaluate naming conventions.
  - Refine attribute atomicity:
    * An atomic attribute is an attribute that cannot be further subdivided.
    * Atomicity is a characteristic of an attribute that cannot be divided into smaller units.
  - Identify new attributes.
  - Identify new relationships.

Improving the Design (2 of 2)

The following are various types of issues you need to address to produce a good normalized set of tables (continued):
  - Refine primary keys as required for data granularity:
    * Granularity refers to the level of detail represented by the values stored in a table’s row.
  - Maintain historical accuracy.
  - Evaluate using derived attributes.

Knowledge Check Activity 6-2

Question: When is a table in 3NF?
Answer: A table is in 3NF when it is in 2NF and it contains no transitive dependencies.

Surrogate Key Considerations

Surrogate keys are used by designers when the primary key is considered to be unsuitable.
A surrogate key is a system-defined attribute generally created and managed via the DBMS:
- Usually a numeric value which is automatically incremented for each new row.

Higher-Level Normal Forms

Higher normal forms are sometimes useful for removing complex redundancy, preventing advanced anomalies:
  - Boyce-Codd normal form (BCNF).
  - Fourth normal form (4NF).
  - Fifth normal form (5NF).

The Boyce-Codd Normal Form (BCNF)

A table is in BCNF when it is in 3NF and every determinant in the table is a candidate key:
  - Recall from Chapter 3 that a candidate key has the same characteristics as a primary key but was not chosen to be the primary key.
  - When a table contains only one candidate key, the 3NF and the BCNF are equivalent.
  - BCNF can be violated only when the table contains more than one candidate key.
  - BCNF is considered a special case of 3NF.

Fourth Normal Form (4NF)

The discussion of 4NF is academic if you ensure that your tables conform to the following two rules:
- All attributes must depend on the primary key but must be independent of each other.
- No row may contain two or more multivalued facts about an entity.
A table is in 4NF under the following circumstances:
- When it is in 3NF.
- When it has no multivalued dependencies.

Fifth Normal Form (5NF)

Fifth normal form, also known as project join normal form (PJNF), addresses the issue where a table cannot be decomposed anymore without losing data or creating incorrect information:
  - Lossless decomposition occurs when the decomposed tables are joined and the original table is recreated.
  - Higher normal forms can provide value; however, the value is limited by the additional processing necessary to work with the data.
  - The lower normal forms are generally highly desirable and should always be considered during the database design process.

Denormalization

Defects in unnormalized tables include the following:
  - Data updates are less efficient because tables are larger.
  - Indexing is more cumbersome.
  - Unnormalized database tables often lead to various data redundancy disasters in production databases.
Important database design goals include the following:
  - Creation of normalized relations.
  - Considering processing requirements and speed:
    * A problem with normalization is that as tables are decomposed to conform to normalization requirements, the number of database tables expands.
    * Joining a larger number of tables takes additional input/output (I/O) operations and processing logic, thereby reducing system speed.

Data-Modeling Checklist (1 of 5)

Business rules:
  - Properly document and verify all business rules with the end users.
  - Ensure that all business rules are written precisely, clearly, and simply:
    * The business rules must help identify entities, attributes, relationships, and constraints.
  - Identify the source of all business rules and ensure that each business rule is justified, dated, and signed off by an approving authority.

Data-Modeling Checklist (2 of 5)

Data modeling:
  - Naming conventions: all names should be limited in length (database-dependent size).
  - Entity names:
    * Should be nouns that are familiar to business and should be short and meaningful.
    * Should document abbreviations, synonyms, and aliases for each entity.
    * Should be unique within the model.
    * Composite entities may include a combination of abbreviated names of the entities linked through the composite entity.

Data-Modeling Checklist (3 of 5)

Data modeling (continued):
  - Entities:
    * Each entity should represent a single subject.
    * Each entity should represent a set of distinguishable entity instances.
    * All entities should be in 3NF or higher. Any entities below 3NF should be justified.
    * The granularity of the entity instance should be clearly defined.
    * PK should be clearly defined and support the selected data granularity.

Data-Modeling Checklist (4 of 5)

Data modeling (continued):
  - Attributes:
    * Should be simple and single-valued (atomic data).
    * Should document default values, constraints, synonyms, and aliases.
    * Derived attributes should be clearly identified and include source(s).
    * Should not be redundant unless this is required for transaction accuracy, performance, or maintaining a history.
    * Nonkey attributes must be fully dependent on the PK attribute.

Data-Modeling Checklist (5 of 5)

Data modeling (continued):
  - Relationships:
    * Should clearly identify relationship participants.
    * Should clearly define participation, connectivity, and document cardinality.
  - ER model:
    * Should be validated against expected processes: inserts, updates, and deletions.
    * Should evaluate where, when, and how to maintain a history.
    * Should not contain redundant relationships except as required (see attributes).
    * Should minimize data redundancy to ensure single-place updates.