Chapter 14/15: Normalization/Advanced Normalization
The Purpose of Normalization
Normalization (and the purpose) – technique for producing a set of relations with desirable properties, given the data requirements of an enterprise
Characteristics of a suitable set of relations
Minimal number of attributes
Attributes with close logical relationship
Minimal redundancy of attributes
How Normalization Supports Database Design
Normalization Approaches include:
Use as a bottom-up standalone database design technique
Check validation to check the structure of relations
Data Redundancy
Update anomalies
Insertion – specific attributes cannot be inserted into the database without the presence of other attributes
If we wanted to insert details of new members of staff into the StaffBranch table, we must include the details of the branch at which the staff are to be located, like the address. This could lead to inconsistency
If you wanted to insert details of a new branch that currently has no staff, the attributes related to staff would be null, including staffNo which a primary key (and primary keys cannot be null -- violates entity integrity)
By separating the two, you can add staff at a later date
Deletion – Deletion of specific data inadvertently results in the loss of additional, unintended data
If a member of staff left in the StaffBranch table, the details about the branch would be lost since both Staff information and Branch information are stored in one table
Modification/update – modifying any single piece of data calls for updating other copies
If we wanted to update the address for B003 in the Staffbranch relation, we would have to update all the appropriate tuples. Otherwise the data will become inconsistent
Properties associated with decomposing larger relation into smaller ones:
Lossless-join - when you split a big table into smaller ones, you can always put them back together (join them) without losing any information from the original table.
Dependency preservation - if there are rules (constraints) on the original big table, you can still enforce those rules by checking the smaller tables separately. You don’t need to join the smaller tables together to check if the rules are being followed.
Functional Dependencies
Functional dependency – describes the relationship between attributes
A ---> B: B is functionally dependent on A or A is a determinant for B
Determinant - Attribute or group of attributes on the left side of the arrow
If we know the value of A, then we will always find the same single value for B for every row that has that value of A at any point in time
However, when given a value of B there may be several different values of A
When a functional dependency is present, the dependency specified is called a constraint between the attributes
Full Functional Dependency - assuming A → B, B is fully functionally dependent on A if B is functionally dependent on A, but not on any proper subset of A
Partial dependency - if some attribute from A is removed and the dependency still holds
Transitive Dependency -If A → B and B → C then C is transitively dependent on A via B
The Process of Normalization
First Normal Form (1NF)
Unnormalized Form (0NF) - A table that contains one or more repeating groups
Repeating group - an attribute or group of attributes within a table that occurs with multiple values for a single occurrence of the nominated key attributes(s) for that table
To remove repeating groups, do one of two things:
Enter appropriate data in the empty columns of rows containing the repeating data
AKA fill in the blanks by duplicating the nonrepeating data. Referred to as flattening the table"
By placing the repeating data, along with a copy of the original key attribute(s), in a separate relation.
First Normal Form (1NF) - Intersection of each row and column contains only one (atomic) value
Second Normal Form (2NF)
2NF - 1NF + every non PK is fully functionally dependent on the PK
Removes partial dependencies by placing them in a new relation along with a copy of their determinant
Third Normal Form (3NF)
3NF - 2NF + Removal of non-primary key attributes that are transitive dependent on the primary key
To normalize: Place attributes in a new relation along with a copy of the determinant