lect17-annotated
Page 1: Announcements & Resources
Lucy's tutorial for the Web Front-End option of the project is scheduled for today at 3 PM.
Check Ed post for room confirmation.
3 additional TA-led recordings from last fall are available on CS W4111.001 CourseWorks.
Course Title: Introduction to Databases Fall 2024, Computer Science Department, Columbia University.
Page 2: Schema Refinement and Normal Forms
Continuing discussion on Schema Refinement and Normal Forms.
Page 3: Database Design Steps
Real-world understanding: Understanding the problem domain.
E/R model: Creating an Entity-Relationship model.
Relational schema: Converting the E/R model to a relational schema.
Better relational schema: Refining the schema for optimization.
Relational DBMS: Implementing the schema in a Relational Database Management System.
Normalization: A design theory for relations aimed to eliminate redundancy during the transition from Step 3 to Step 4.
Page 4: Anomalies in MovieBad Schema
Schema: MovieBad(title, year, length, starName)
Types of anomalies:
Redundancy: Length for a movie stored multiple times.
Update anomaly: Inconsistencies if the length of a movie is updated improperly.
Deletion anomaly: Deleting a star might lead to loss of movie length data.
Page 5: Functional Dependencies (FDs)
Definition: A functional dependency is a constraint that relates attributes of a relation based on real-world knowledge.
Example: In the Movie(title, year, length) table, the functional dependency title, year → length holds.
It means if two Movie tuples agree on title and year, they must agree on length as well.
Page 6: Reasoning About FDs with Armstrong's Axioms
To compute the closure of a set of FDs, Armstrong’s axioms can be used:
Reflexivity: If Y ⊆ X, then X → Y.
Augmentation: If X → Y holds, then X U Z → Y U Z.
Transitivity: If X → Y and Y → Z, then X → Z.
These axioms provide a complete and sound set of inference rules for FDs.
Page 7: Closure of Set of Attributes
The closure of a set of attributes {A1, A2, …, Ak}, denoted {A1, …, Ak}+, consists of all attributes B in a relation R such that A1, A2, …, Ak → B holds.
Page 8: Closure of Set of Attributes: An Algorithm
Steps to compute the closure {A1, …, Ak}+, given a relation R and FDs S:
Page 9: Algorithm Steps for Closure Calculation
Start with {A1, …, Ak}.
Repeat until no change:
If the current set of attributes includes the full left side of an FD in S, then add all right side attributes of that FD to the set.
Return the set of attributes after no changes.
Page 10: Using Closure: Example Scenario
Relation: R(A, B, C, D, E) with FDs:
A → B
B → C
CD → E
Question: Does AD → E hold?
Page 11: Check Closure for Relation Example
For R(A, B, C, D, E): Compute {A, D}+ to check if E is included in the closure based on existing FDs.
Page 12: Step-By-Step Closure Computation
Starting set: {A, D}.
Closure includes A, so add B: closure = {A, B, D}.
Page 13: Continuing Closure Computation
Closure includes B, so add C: closure = {A, B, C, D}.
Closure also includes CD, so add E: closure = {A, B, C, D, E}.
No further changes possible; hence the final closure is {A, D}+ = {A, B, C, D, E}.
Page 14: Understanding Functional Dependencies & Keys
A relation R(A1, A2, …, An) has a key K if:
K functionally determines all attributes of R (K → A1, A2, …, An).
No strict subset of K satisfies the first condition, meaning K is minimal.
Page 15: Key Definitions Using Functional Dependencies
If any attribute set K' meets Condition 1, K' is considered a superkey.
Every key is a superkey, but not every superkey is a key due to the minimality condition.
Page 16: Checking Keys with Functional Dependencies
Relation: EmpsBad (ssn, name, did, rank, sal) using FDs S:
FDs from constraints: { ssn → name, did, rank; did, rank → sal }.
Task: Check if {ssn} is a key.
Page 17: Determining Keys Algorithm for a Relation
To find all keys for a relation with given FDs:
Identify every attribute set K such that:
K → all attributes of R.
K is minimal (no strict subset works).
Page 18: Finding All Keys for Relations
Method to find keys involves using attribute closure algorithm for each combination of attributes in R.
Example with EmpsBad relation and corresponding FDs to deduce keys.
Page 19: Identifying All Keys for a Relation Continued
The process identifies sets that meet the conditions of functional dependency and minimality.
Page 20: Algorithm for Finding All Keys
Iterate through all sets of attributes, compute their closures, and check if all attributes of R are functionally determined.
Page 21: Evaluating Keys in Relation Examples
Given FDs, determine potential keys. Consider all valid combinations of attribute sets.
Page 22: Evaluating Functional Dependencies in EmpsBad
Evaluate FDs for EmpsBad and check closure for potential attributes to find keys.
Page 23: Evaluating Relation Schema Quality
Normal forms help determine the quality of relation schemas in avoiding redundancy.
Boyce-Codd Normal Form (BCNF) recognizes effective schemas if FDs satisfy certain criteria.
Page 24: Criteria for BCNF Compliance
A relation is in BCNF if, for every FDs X → A, either:
X → A is trivial (A is part of X), or
X is a superkey.
Page 25: BCNF Status of EmpsBad Relation
Evaluate the schema EmpsBad with given FDs to determine its BCNF status.
Page 26: Checking Key and Functional Dependency in EmpsBad
Identify keys and analyze functional dependency to ascertain if relation is in BCNF and its minimality.
Page 27: Is EmpsBad in BCNF? Analysis
The functional FD did, rank → sal indicates EmpsBad is not in BCNF due to nontrivial findings.
Page 28: Identifying Redundancy in EmpsBad
If a key is not present on the left side of a nontrivial functional dependency, anomalies can occur, leading to redundancy.
Page 29: Another Example with MovieBad
Review schema for MovieBad where title, year → length is established.
Page 30: MovieBad Schema Analysis
The key {title, year, starName} is identified, but the schema fails BCNF due to missing attribute in the key dependencies.
Page 31: Handling BCNF Violations
When a relation is not in BCNF, it can be decomposed into smaller relations to reduce redundancy.
Page 32: The Process of Decomposition
Decompose relations using offending FDs to split schemas such as EmpsBad and MovieBad into more manageable and compliant structures.
Page 33: BCNF Decomposition Algorithm
Steps to decompose a relation into BCNF involve understanding relation schema and functional dependencies.
Page 34: Detailed BCNF Decomposition Steps
Repeat decomposition for all BCNF violations, ensuring to compute FDs for new relations.
Page 35: Evaluating Decomposition Validity
Focus on FDs relevant to schema rather than all associations post-decomposition to maintain clarity in final relations.
Page 36: BCNF Decomposition Quality Assessment
The algorithm guarantees lossless decompositions; it reconstructs original relations without losing data integrity.
Page 37: Valid Relation Decomposition Requirements
A valid decomposition maintains a union of attributes encompassing the original relation while adding no extraneous attributes.
Page 38: Example of Successful Valid Decomposition
Break down MovieBad into Movie and MovieStar validly while preserving all original attributes.
Page 39: Evaluating Decomposition Effectiveness
Comparison of valid decompositions with shared attributes helps evaluate appropriateness.
Page 40: Natural Join Concerns with Decomposed Relations
Evaluate potential spurious tuples during natural joins, which can emerge from inadequate attribute sharing between decompositions.
Page 41: Inspecting Spurious Data through Natural Joins
Demonstrate tuple redundancy and incorrect data representation that can occur from poorly structured decompositions.
Page 42: Identifying Issues in Reconstruction Post-Decomposition
Address how inadequate attribute sharing in decomposed relations obstructs natural join efficiency.
Page 43: Lossless Join Decompositions Overview
Discuss conditions under which R decomposed into R1 and R2 retains integrity with respect to functional dependencies.
Page 44: Theorem for Lossless Join Decomposition
A decomposition R into R1 and R2 is lossless if the closure of functional dependencies includes attributes from both R1 and R2.
Page 45: Efficiency Considerations in Decompositions
Focus on efficiency when checking FDs over decomposed relations, such as the need for joining smaller parts during validation of constraints.
Page 46: Balancing BCNF and Dependency Preservation
While ensuring lossless joins, sometimes redundancy may occur, necessitating a broader perspective through 3NF to accommodate functional dependencies.
Page 47: Summary of Schema Refinement Concepts
Aim for BCNF to eliminate redundancies but recognize possible necessity for 3NF tolerating minimal redundancy while ensuring efficient data management.