lect17-annotated

Page 1: Announcements & Resources

  • Lucy's tutorial for the Web Front-End option of the project is scheduled for today at 3 PM.

  • Check Ed post for room confirmation.

  • 3 additional TA-led recordings from last fall are available on CS W4111.001 CourseWorks.

  • Course Title: Introduction to Databases Fall 2024, Computer Science Department, Columbia University.

Page 2: Schema Refinement and Normal Forms

  • Continuing discussion on Schema Refinement and Normal Forms.

Page 3: Database Design Steps

  1. Real-world understanding: Understanding the problem domain.

  2. E/R model: Creating an Entity-Relationship model.

  3. Relational schema: Converting the E/R model to a relational schema.

  4. Better relational schema: Refining the schema for optimization.

  5. Relational DBMS: Implementing the schema in a Relational Database Management System.

  • Normalization: A design theory for relations aimed to eliminate redundancy during the transition from Step 3 to Step 4.

Page 4: Anomalies in MovieBad Schema

  • Schema: MovieBad(title, year, length, starName)

  • Types of anomalies:

    • Redundancy: Length for a movie stored multiple times.

    • Update anomaly: Inconsistencies if the length of a movie is updated improperly.

    • Deletion anomaly: Deleting a star might lead to loss of movie length data.

Page 5: Functional Dependencies (FDs)

  • Definition: A functional dependency is a constraint that relates attributes of a relation based on real-world knowledge.

  • Example: In the Movie(title, year, length) table, the functional dependency title, year → length holds.

    • It means if two Movie tuples agree on title and year, they must agree on length as well.

Page 6: Reasoning About FDs with Armstrong's Axioms

  • To compute the closure of a set of FDs, Armstrong’s axioms can be used:

    • Reflexivity: If Y ⊆ X, then X → Y.

    • Augmentation: If X → Y holds, then X U Z → Y U Z.

    • Transitivity: If X → Y and Y → Z, then X → Z.

  • These axioms provide a complete and sound set of inference rules for FDs.

Page 7: Closure of Set of Attributes

  • The closure of a set of attributes {A1, A2, …, Ak}, denoted {A1, …, Ak}+, consists of all attributes B in a relation R such that A1, A2, …, Ak → B holds.

Page 8: Closure of Set of Attributes: An Algorithm

  • Steps to compute the closure {A1, …, Ak}+, given a relation R and FDs S:

Page 9: Algorithm Steps for Closure Calculation

  • Start with {A1, …, Ak}.

  • Repeat until no change:

    • If the current set of attributes includes the full left side of an FD in S, then add all right side attributes of that FD to the set.

  • Return the set of attributes after no changes.

Page 10: Using Closure: Example Scenario

  • Relation: R(A, B, C, D, E) with FDs:

    1. A → B

    2. B → C

    3. CD → E

  • Question: Does AD → E hold?

Page 11: Check Closure for Relation Example

  • For R(A, B, C, D, E): Compute {A, D}+ to check if E is included in the closure based on existing FDs.

Page 12: Step-By-Step Closure Computation

  • Starting set: {A, D}.

  • Closure includes A, so add B: closure = {A, B, D}.

Page 13: Continuing Closure Computation

  • Closure includes B, so add C: closure = {A, B, C, D}.

  • Closure also includes CD, so add E: closure = {A, B, C, D, E}.

  • No further changes possible; hence the final closure is {A, D}+ = {A, B, C, D, E}.

Page 14: Understanding Functional Dependencies & Keys

  • A relation R(A1, A2, …, An) has a key K if:

    1. K functionally determines all attributes of R (K → A1, A2, …, An).

    2. No strict subset of K satisfies the first condition, meaning K is minimal.

Page 15: Key Definitions Using Functional Dependencies

  • If any attribute set K' meets Condition 1, K' is considered a superkey.

  • Every key is a superkey, but not every superkey is a key due to the minimality condition.

Page 16: Checking Keys with Functional Dependencies

  • Relation: EmpsBad (ssn, name, did, rank, sal) using FDs S:

    • FDs from constraints: { ssn → name, did, rank; did, rank → sal }.

  • Task: Check if {ssn} is a key.

Page 17: Determining Keys Algorithm for a Relation

  • To find all keys for a relation with given FDs:

  • Identify every attribute set K such that:

    1. K → all attributes of R.

    2. K is minimal (no strict subset works).

Page 18: Finding All Keys for Relations

  • Method to find keys involves using attribute closure algorithm for each combination of attributes in R.

  • Example with EmpsBad relation and corresponding FDs to deduce keys.

Page 19: Identifying All Keys for a Relation Continued

  • The process identifies sets that meet the conditions of functional dependency and minimality.

Page 20: Algorithm for Finding All Keys

  • Iterate through all sets of attributes, compute their closures, and check if all attributes of R are functionally determined.

Page 21: Evaluating Keys in Relation Examples

  • Given FDs, determine potential keys. Consider all valid combinations of attribute sets.

Page 22: Evaluating Functional Dependencies in EmpsBad

  • Evaluate FDs for EmpsBad and check closure for potential attributes to find keys.

Page 23: Evaluating Relation Schema Quality

  • Normal forms help determine the quality of relation schemas in avoiding redundancy.

  • Boyce-Codd Normal Form (BCNF) recognizes effective schemas if FDs satisfy certain criteria.

Page 24: Criteria for BCNF Compliance

  • A relation is in BCNF if, for every FDs X → A, either:

    • X → A is trivial (A is part of X), or

    • X is a superkey.

Page 25: BCNF Status of EmpsBad Relation

  • Evaluate the schema EmpsBad with given FDs to determine its BCNF status.

Page 26: Checking Key and Functional Dependency in EmpsBad

  • Identify keys and analyze functional dependency to ascertain if relation is in BCNF and its minimality.

Page 27: Is EmpsBad in BCNF? Analysis

  • The functional FD did, rank → sal indicates EmpsBad is not in BCNF due to nontrivial findings.

Page 28: Identifying Redundancy in EmpsBad

  • If a key is not present on the left side of a nontrivial functional dependency, anomalies can occur, leading to redundancy.

Page 29: Another Example with MovieBad

  • Review schema for MovieBad where title, year → length is established.

Page 30: MovieBad Schema Analysis

  • The key {title, year, starName} is identified, but the schema fails BCNF due to missing attribute in the key dependencies.

Page 31: Handling BCNF Violations

  • When a relation is not in BCNF, it can be decomposed into smaller relations to reduce redundancy.

Page 32: The Process of Decomposition

  • Decompose relations using offending FDs to split schemas such as EmpsBad and MovieBad into more manageable and compliant structures.

Page 33: BCNF Decomposition Algorithm

  • Steps to decompose a relation into BCNF involve understanding relation schema and functional dependencies.

Page 34: Detailed BCNF Decomposition Steps

  • Repeat decomposition for all BCNF violations, ensuring to compute FDs for new relations.

Page 35: Evaluating Decomposition Validity

  • Focus on FDs relevant to schema rather than all associations post-decomposition to maintain clarity in final relations.

Page 36: BCNF Decomposition Quality Assessment

  • The algorithm guarantees lossless decompositions; it reconstructs original relations without losing data integrity.

Page 37: Valid Relation Decomposition Requirements

  • A valid decomposition maintains a union of attributes encompassing the original relation while adding no extraneous attributes.

Page 38: Example of Successful Valid Decomposition

  • Break down MovieBad into Movie and MovieStar validly while preserving all original attributes.

Page 39: Evaluating Decomposition Effectiveness

  • Comparison of valid decompositions with shared attributes helps evaluate appropriateness.

Page 40: Natural Join Concerns with Decomposed Relations

  • Evaluate potential spurious tuples during natural joins, which can emerge from inadequate attribute sharing between decompositions.

Page 41: Inspecting Spurious Data through Natural Joins

  • Demonstrate tuple redundancy and incorrect data representation that can occur from poorly structured decompositions.

Page 42: Identifying Issues in Reconstruction Post-Decomposition

  • Address how inadequate attribute sharing in decomposed relations obstructs natural join efficiency.

Page 43: Lossless Join Decompositions Overview

  • Discuss conditions under which R decomposed into R1 and R2 retains integrity with respect to functional dependencies.

Page 44: Theorem for Lossless Join Decomposition

  • A decomposition R into R1 and R2 is lossless if the closure of functional dependencies includes attributes from both R1 and R2.

Page 45: Efficiency Considerations in Decompositions

  • Focus on efficiency when checking FDs over decomposed relations, such as the need for joining smaller parts during validation of constraints.

Page 46: Balancing BCNF and Dependency Preservation

  • While ensuring lossless joins, sometimes redundancy may occur, necessitating a broader perspective through 3NF to accommodate functional dependencies.

Page 47: Summary of Schema Refinement Concepts

  • Aim for BCNF to eliminate redundancies but recognize possible necessity for 3NF tolerating minimal redundancy while ensuring efficient data management.