Database Design Theory: Introduction to Normalization

Normalization and Database Design

Chapter 14 Outline

  • Normalization: 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
  • Informal Design Guidelines for Relation Schemas
  • Functional Dependencies
  • Normal Forms Based on Primary Keys
  • General Definitions of Second and Third Normal Forms
  • Boyce-Codd Normal Form
  • Multivalued Dependency and Fourth Normal Form
  • Join Dependencies and Fifth Normal Form

Introduction

  • Levels for discussing the goodness of relation schemas:
    • Logical (or conceptual) level
    • Implementation (or physical storage) level
  • Approaches to database design:
    • Bottom-up
    • Top-down

Informal Design Guidelines for Relation Schemas

  • Measures of quality:
    • Making sure attribute semantics are clear
    • Reducing redundant information in tuples
    • Reducing NULL values in tuples
    • Disallowing the possibility of generating spurious tuples

Imparting Clear Semantics to Attributes in Relations

  • Semantics of a relation: Meaning resulting from the interpretation of attribute values in a tuple.
  • Easier to explain semantics of a relation indicates a better schema design.
  • Guideline 1: Design relation schema so that it is easy to explain its meaning.
  • Do not combine attributes from multiple entity types and relationship types into a single relation.

Redundant Information in Tuples and Update Anomalies

  • Grouping attributes into relation schemas has a significant effect on storage space.
  • Storing natural joins of base relations leads to update anomalies.
  • Types of update anomalies:
    • Insertion
    • Deletion
    • Modification
  • Guideline 2: Design base relation schemas so that no update anomalies are present in the relations.
    • If any anomalies are present:
      • Note them clearly
      • Make sure that the programs that update the database will operate correctly

NULL Values in Tuples

  • May group many attributes together into a “fat” relation, which can result in many NULLs.
  • Problems with NULLs:
    • Wasted storage space
    • Problems understanding the meaning
  • Guideline 3: Avoid placing attributes in a base relation whose values may frequently be NULL.
    • If NULLs are unavoidable:
      • Make sure that they apply in exceptional cases only, not to a majority of tuples

Spurious Tuples

  • Guideline 4: Design relation schemas to be joined with equality conditions on attributes that are appropriately related.
    • Guarantees that no spurious tuples are generated
  • Avoid relations that contain matching attributes that are not (foreign key, primary key) combinations

Functional Dependencies

  • Formal tool for the analysis of relational schemas
  • Enables detection and description of problems in precise terms
  • Theory of functional dependency

Definition of Functional Dependency

  • Constraint between two sets of attributes from the database.
  • Property of semantics or meaning of the attributes.
  • Legal relation states satisfy the functional dependency constraints.
  • Given a populated relation, one cannot determine which FDs hold and which do not, unless the meaning of and relationships among attributes are known.
  • Can state that FD does not hold if there are tuples that show violation of such an FD.

Normal Forms Based on Primary Keys

  • Normalization process:
    • Process for evaluating and correcting table structures to minimize data redundancies
    • Reduces data anomalies
  • Approaches for relational schema design:
    • Perform a conceptual schema design using a conceptual model then map conceptual design into a set of relations
    • Design relations based on external knowledge derived from existing implementation of files or forms or reports

Normalization of Relations

  • Takes a relation schema through a series of tests to certify whether it satisfies a certain normal form.
  • Proceeds in a top-down fashion.
  • Normal form tests.
  • Normalization is carried out in practice and resulting designs are of high quality and meet the desirable properties stated previously.
  • Pays particular attention to normalization only up to 3NF, BCNF, or at most 4NF.
  • Do not need to normalize to the highest possible normal form.

Definitions of Keys and Attributes Participating in Keys

  • Definition of superkey and key
  • Candidate key
    • If more than one key in a relation schema:
      • One is a primary key
      • Others are secondary keys

First Normal Form (1NF)

  • Part of the formal definition of a relation in the basic (flat) relational model
  • Only attribute values permitted are single atomic (or indivisible) values
  • Techniques to achieve first normal form:
    • Remove attribute and place it in a separate relation
    • Expand the key
    • Use several atomic attributes

Conversion to First Normal Form

*Repeating group: Group of multiple entries of the same type that can exist for any single key attribute occurrence.

  • A relational table must not contain repeating groups.
  • Normalizing table structure will reduce data redundancies.
  • Normalization is a three-step procedure.
    • Step 1: Eliminate the Repeating Groups
      • Eliminate nulls: each repeating group attribute contains an appropriate data value
    • Step 2: Identify the Primary Key
      • Must uniquely identify attribute value
      • New key must be composed
    • Step 3: Identify All Dependencies
      • Dependencies are depicted with a diagram

Characteristics of First Normal Form

  • First normal form describes tabular format:
    • All key attributes are defined
    • No repeating groups in the table
    • All attributes are dependent on the primary key
  • Some tables contain partial dependencies,
    *Dependencies are based on part of the primary key
    *Should be used with caution

Second Normal Form (2NF)

  • Based on the concept of full functional dependency versus partial dependency
  • Second normalize into a number of 2NF relations
  • Nonprime attributes are associated only with part of the primary key on which they are fully functionally dependent

Conversion to Second Normal Form

  • A table is in second normal form (2NF) when:
    • It is in 1NF and
    • It includes no partial dependencies:
      • No attribute is dependent on only a portion of the primary key
    • In other words:
      • Remove vertical redundancy; the same value should not repeat across rows
  • Step 1: Make New Tables to Eliminate Partial Dependencies
    • Write each key component on a separate line, then write the original (composite) key on the last line.
    • Each component will become a key in the new table
  • Step 2: Assign Corresponding Dependent Attributes
    • Determine attributes that are dependent on other attributes
    • At this point, most anomalies have been eliminated

Third Normal Form (3NF)

  • Based on the concept of transitive dependency
  • Problematic FD:
    • Left-hand side is part of the primary key
    • Left-hand side is a nonkey attribute

Conversion to Third Normal Form

  • Step 1: Make New Tables to Eliminate Transitive Dependencies
    • For every transitive dependency, write its determinant as PK for the new table
    • Determinant: any attribute whose value determines other values within a row
  • Step 2: Reassign Corresponding Dependent Attributes
    • Identify attributes dependent on each determinant identified in Step 1.
    • Identify the dependency
    • Name table to reflect its contents and function

General Definitions of Second and Third Normal Forms

Normal FormTestRemedy (Normalization)
First (1NF)Relation should have no multivalued attributes or nested relations.Form new relations for each multivalued attribute or nested relation.
Second (2NF)For relations where the primary key contains multiple attributes, no nonkey attribute should be functionally dependent on a part of the primary key.Decompose and set up a new relation for each partial key with its dependent attribute(s). Make sure to keep a relation with the original primary key and any attributes that are fully functionally dependent on it.
Third (3NF)Relation should not have a nonkey attribute functionally determined by another nonkey attribute (or by a set of nonkey attributes). That is, there should be no transitive dependency of a nonkey attribute on the primary key.Decompose and set up a relation that includes the nonkey attribute(s) that functionally determine(s) other nonkey attribute(s).
  • Prime attribute: Part of any candidate key will be considered as prime.
  • Consider partial, full functional, and transitive dependencies with respect to all candidate keys of a relation.

General Definition of Second Normal Form

  • Definition: A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is not partially dependent on any key of R.

General Definition of Third Normal Form

  • Definition: A relation schema R is in third normal form (3NF) if, whenever a nontrivial functional dependency X A holds in R, either
    • (a) X is a superkey of R, or
    • (b) A is a prime attribute of R.

Alternative Definition

  • A relation schema R is in 3NF if every nonprime attribute of R meets both of the following conditions:
    • It is fully functionally dependent on every key of R.
    • It is nontransitively dependent on every key of R.

Boyce-Codd Normal Form (BCNF)

  • Every relation in BCNF is also in 3NF.
  • A relation in 3NF is not necessarily in BCNF.
  • Difference: Condition which allows A to be prime is absent from BCNF.
  • Most relation schemas that are in 3NF are also in BCNF.

Characteristics of BCNF

  • Every determinant in a table is a candidate key.
    • Has the same characteristics as a primary key but, for some reason, was not chosen to be the primary key.
  • When a table contains only one candidate key, the 3NF and the BCNF are equivalent.
  • BCNF can be violated only when a table contains more than one candidate key.
  • Most designers consider the BCNF as a special case of 3NF.
  • A table is in 3NF when it is in 2NF and there are no transitive dependencies.
  • A table can be in 3NF and fail to meet BCNF when:
    • A nonkey attribute is the determinant of a key attribute.

Fourth Normal Form (4NF)

  • A table is in fourth normal form (4NF) when both of the following are true:
    • It is in 3NF
    • No multiple sets of multivalued dependencies
  • 4NF is largely academic if tables conform to the following two rules:
    • All attributes are dependent on the primary key and independent of each other
    • No row contains two or more multivalued facts about an entity

Multivalued Dependency (MVD)

  • Consequence of first normal form (1NF).
  • Relations containing nontrivial MVDs are all-key relations.
  • Fourth normal form (4NF) is violated when a relation has undesirable multivalued dependencies.

Join Dependencies and Fifth Normal Form (5NF)

  • Join dependency:
  • Multiway decomposition into fifth normal form (5NF)
  • Very peculiar semantic constraint.
  • Normalization into 5NF is very rarely done in practice.
  • Definition: A join dependency (JD), denoted by JD(R1, R2, …, Rn), specified on relation schema R, specifies a constraint on the states r of R. The constraint states that every legal state r of R should have a nonadditive join decomposition into R1, R2, …, Rn. Hence, for every such r we have
    * ( {R1}(r), {R2}(r), …, {Rn}(r)) = r
  • Definition: A relation schema R is in fifth normal form (5NF) (or project-join normal form (PJNF)) with respect to a set F of functional, multivalued, and join dependencies if, for every nontrivial join dependency JD(R1, R2, …, Rn) in F^+(that is, implied by F), every Ri is a superkey of R.