Database Design Theory: Introduction to Normalization
Normalization and Database Design
Chapter 14 Outline
- Normalization: 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
- Informal Design Guidelines for Relation Schemas
- Functional Dependencies
- Normal Forms Based on Primary Keys
- General Definitions of Second and Third Normal Forms
- Boyce-Codd Normal Form
- Multivalued Dependency and Fourth Normal Form
- Join Dependencies and Fifth Normal Form
Introduction
- Levels for discussing the goodness of relation schemas:
- Logical (or conceptual) level
- Implementation (or physical storage) level
- Approaches to database design:
- Bottom-up
- Top-down
Informal Design Guidelines for Relation Schemas
- Measures of quality:
- Making sure attribute semantics are clear
- Reducing redundant information in tuples
- Reducing NULL values in tuples
- Disallowing the possibility of generating spurious tuples
Imparting Clear Semantics to Attributes in Relations
- Semantics of a relation: Meaning resulting from the interpretation of attribute values in a tuple.
- Easier to explain semantics of a relation indicates a better schema design.
- Guideline 1: Design relation schema so that it is easy to explain its meaning.
- Do not combine attributes from multiple entity types and relationship types into a single relation.
Redundant Information in Tuples and Update Anomalies
- Grouping attributes into relation schemas has a significant effect on storage space.
- Storing natural joins of base relations leads to update anomalies.
- Types of update anomalies:
- Insertion
- Deletion
- Modification
- Guideline 2: Design base relation schemas so that no update anomalies are present in the relations.
- If any anomalies are present:
- Note them clearly
- Make sure that the programs that update the database will operate correctly
- If any anomalies are present:
NULL Values in Tuples
- May group many attributes together into a “fat” relation, which can result in many NULLs.
- Problems with NULLs:
- Wasted storage space
- Problems understanding the meaning
- Guideline 3: Avoid placing attributes in a base relation whose values may frequently be NULL.
- If NULLs are unavoidable:
- Make sure that they apply in exceptional cases only, not to a majority of tuples
- If NULLs are unavoidable:
Spurious Tuples
- Guideline 4: Design relation schemas to be joined with equality conditions on attributes that are appropriately related.
- Guarantees that no spurious tuples are generated
- Avoid relations that contain matching attributes that are not (foreign key, primary key) combinations
Functional Dependencies
- Formal tool for the analysis of relational schemas
- Enables detection and description of problems in precise terms
- Theory of functional dependency
Definition of Functional Dependency
- Constraint between two sets of attributes from the database.
- Property of semantics or meaning of the attributes.
- Legal relation states satisfy the functional dependency constraints.
- Given a populated relation, one cannot determine which FDs hold and which do not, unless the meaning of and relationships among attributes are known.
- Can state that FD does not hold if there are tuples that show violation of such an FD.
Normal Forms Based on Primary Keys
- Normalization process:
- Process for evaluating and correcting table structures to minimize data redundancies
- Reduces data anomalies
- Approaches for relational schema design:
- Perform a conceptual schema design using a conceptual model then map conceptual design into a set of relations
- Design relations based on external knowledge derived from existing implementation of files or forms or reports
Normalization of Relations
- Takes a relation schema through a series of tests to certify whether it satisfies a certain normal form.
- Proceeds in a top-down fashion.
- Normal form tests.
- Normalization is carried out in practice and resulting designs are of high quality and meet the desirable properties stated previously.
- Pays particular attention to normalization only up to 3NF, BCNF, or at most 4NF.
- Do not need to normalize to the highest possible normal form.
Definitions of Keys and Attributes Participating in Keys
- Definition of superkey and key
- Candidate key
- If more than one key in a relation schema:
- One is a primary key
- Others are secondary keys
- If more than one key in a relation schema:
First Normal Form (1NF)
- Part of the formal definition of a relation in the basic (flat) relational model
- Only attribute values permitted are single atomic (or indivisible) values
- Techniques to achieve first normal form:
- Remove attribute and place it in a separate relation
- Expand the key
- Use several atomic attributes
Conversion to First Normal Form
*Repeating group: Group of multiple entries of the same type that can exist for any single key attribute occurrence.
- A relational table must not contain repeating groups.
- Normalizing table structure will reduce data redundancies.
- Normalization is a three-step procedure.
- Step 1: Eliminate the Repeating Groups
- Eliminate nulls: each repeating group attribute contains an appropriate data value
- Step 2: Identify the Primary Key
- Must uniquely identify attribute value
- New key must be composed
- Step 3: Identify All Dependencies
- Dependencies are depicted with a diagram
- Step 1: Eliminate the Repeating Groups
Characteristics of First Normal Form
- First normal form describes tabular format:
- All key attributes are defined
- No repeating groups in the table
- All attributes are dependent on the primary key
- Some tables contain partial dependencies,
*Dependencies are based on part of the primary key
*Should be used with caution
Second Normal Form (2NF)
- Based on the concept of full functional dependency versus partial dependency
- Second normalize into a number of 2NF relations
- Nonprime attributes are associated only with part of the primary key on which they are fully functionally dependent
Conversion to Second Normal Form
- A table is in second normal form (2NF) when:
- It is in 1NF and
- It includes no partial dependencies:
- No attribute is dependent on only a portion of the primary key
- In other words:
- Remove vertical redundancy; the same value should not repeat across rows
- Step 1: Make New Tables to Eliminate Partial Dependencies
- Write each key component on a separate line, then write the original (composite) key on the last line.
- Each component will become a key in the new table
- Step 2: Assign Corresponding Dependent Attributes
- Determine attributes that are dependent on other attributes
- At this point, most anomalies have been eliminated
Third Normal Form (3NF)
- Based on the concept of transitive dependency
- Problematic FD:
- Left-hand side is part of the primary key
- Left-hand side is a nonkey attribute
Conversion to Third Normal Form
- Step 1: Make New Tables to Eliminate Transitive Dependencies
- For every transitive dependency, write its determinant as PK for the new table
- Determinant: any attribute whose value determines other values within a row
- Step 2: Reassign Corresponding Dependent Attributes
- Identify attributes dependent on each determinant identified in Step 1.
- Identify the dependency
- Name table to reflect its contents and function
General Definitions of Second and Third Normal Forms
Normal Form | Test | Remedy (Normalization) |
---|---|---|
First (1NF) | Relation should have no multivalued attributes or nested relations. | Form new relations for each multivalued attribute or nested relation. |
Second (2NF) | For relations where the primary key contains multiple attributes, no nonkey attribute should be functionally dependent on a part of the primary key. | Decompose and set up a new relation for each partial key with its dependent attribute(s). Make sure to keep a relation with the original primary key and any attributes that are fully functionally dependent on it. |
Third (3NF) | Relation should not have a nonkey attribute functionally determined by another nonkey attribute (or by a set of nonkey attributes). That is, there should be no transitive dependency of a nonkey attribute on the primary key. | Decompose and set up a relation that includes the nonkey attribute(s) that functionally determine(s) other nonkey attribute(s). |
- Prime attribute: Part of any candidate key will be considered as prime.
- Consider partial, full functional, and transitive dependencies with respect to all candidate keys of a relation.
General Definition of Second Normal Form
- Definition: A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is not partially dependent on any key of R.
General Definition of Third Normal Form
- Definition: A relation schema R is in third normal form (3NF) if, whenever a nontrivial functional dependency X A holds in R, either
- (a) X is a superkey of R, or
- (b) A is a prime attribute of R.
Alternative Definition
- A relation schema R is in 3NF if every nonprime attribute of R meets both of the following conditions:
- It is fully functionally dependent on every key of R.
- It is nontransitively dependent on every key of R.
Boyce-Codd Normal Form (BCNF)
- Every relation in BCNF is also in 3NF.
- A relation in 3NF is not necessarily in BCNF.
- Difference: Condition which allows A to be prime is absent from BCNF.
- Most relation schemas that are in 3NF are also in BCNF.
Characteristics of BCNF
- Every determinant in a table is a candidate key.
- Has the same characteristics as a primary key but, for some reason, was not chosen to be the primary key.
- When a table contains only one candidate key, the 3NF and the BCNF are equivalent.
- BCNF can be violated only when a table contains more than one candidate key.
- Most designers consider the BCNF as a special case of 3NF.
- A table is in 3NF when it is in 2NF and there are no transitive dependencies.
- A table can be in 3NF and fail to meet BCNF when:
- A nonkey attribute is the determinant of a key attribute.
Fourth Normal Form (4NF)
- A table is in fourth normal form (4NF) when both of the following are true:
- It is in 3NF
- No multiple sets of multivalued dependencies
- 4NF is largely academic if tables conform to the following two rules:
- All attributes are dependent on the primary key and independent of each other
- No row contains two or more multivalued facts about an entity
Multivalued Dependency (MVD)
- Consequence of first normal form (1NF).
- Relations containing nontrivial MVDs are all-key relations.
- Fourth normal form (4NF) is violated when a relation has undesirable multivalued dependencies.
Join Dependencies and Fifth Normal Form (5NF)
- Join dependency:
- Multiway decomposition into fifth normal form (5NF)
- Very peculiar semantic constraint.
- Normalization into 5NF is very rarely done in practice.
- Definition: A join dependency (JD), denoted by JD(R1, R2, …, Rn), specified on relation schema R, specifies a constraint on the states r of R. The constraint states that every legal state r of R should have a nonadditive join decomposition into R1, R2, …, Rn. Hence, for every such r we have
* ( {R1}(r), {R2}(r), …, {Rn}(r)) = r - Definition: A relation schema R is in fifth normal form (5NF) (or project-join normal form (PJNF)) with respect to a set F of functional, multivalued, and join dependencies if, for every nontrivial join dependency JD(R1, R2, …, Rn) in F^+(that is, implied by F), every Ri is a superkey of R.