Chapter 8 – Relational Database Model
Learning Objectives
- By the end of Chapter 8 you should be able to:
- Describe the logical structure of the relational model and explain how it differs from physical storage structures.
- List and discuss the basic components of the model (tables, rows, columns, keys, relationships, constraints).
- Manipulate table contents through relational operators (select, project, join, union, etc.).
- Explain the roles of the data dictionary and system catalog in metadata management.
- Identify entities, establish relationships, and translate business rules into the relational schema.
- Evaluate and control data redundancy through proper key usage and table design.
- Explain indexing, including index keys, unique indexes, and their impact on performance and integrity.
A Logical View of Data
- The relational model presents data logically rather than physically.
- Core logical construct is the relation (i.e., table).
- Logical simplicity supports disciplined design methodologies and conceptual independence from storage concerns.
Relational Tables: Fundamental Characteristics
- A table is perceived as a two-dimensional grid (rows × columns).
- Each row (tuple) = one occurrence of an entity.
- Each column = one attribute; each column name is unique within its table.
- Intersection of row & column = one single data value (atomicity).
- All values in the same column must share a common data format / domain.
- Each column is defined over an attribute domain (allowable value range).
- Ordering of rows or columns is immaterial to the DBMS.
- Every table has at least one attribute (or set) that uniquely identifies each row (primary key).
Keys & Dependencies
- Key = attribute(s) that determine other attribute values and ensure unique identification.
- Determination: Knowing value of one attribute lets you determine another.
- Attribute providing the value = determinant.
- Attribute whose value is obtained = dependent.
- Functional dependence: Attribute X functionally determines Y if each X value is associated with precisely one Y value.
- Full functional dependence: The entire composite determinant is necessary; no subset of the determinant attributes suffices.
Types of Keys
- Composite key: two or more attributes combined to serve as a key.
- Key attribute: any attribute that forms part of a key.
- Superkey: any attribute set that uniquely identifies rows (may contain unnecessary attributes).
- Candidate key: minimal (irreducible) superkey—no proper subset is a superkey.
- Primary key (PK): candidate key selected to identify rows; cannot contain nulls.
- Entity integrity rule: every table row must have a non-null and unique PK.
- Foreign key (FK): PK from one table placed in another to represent a relationship.
- Referential integrity rule: every FK is either null (when allowed) or matches an existing PK in the referenced table.
- Secondary key: attribute(s) used strictly for data retrieval (e.g., search, grouping) rather than row identity.
- Null represents the absence of a value; can mean unknown, not yet available, or inapplicable.
Handling Nulls & Column Constraints
- NOT NULL constraint: enforces that each row supplies a value for the column.
- UNIQUE constraint: prevents duplicate values within the column.
- Flags: special codes stored in a column to explicitly denote missing/exception conditions.
Integrity Rules in Practice
- RDBMSs enforce entity & referential integrity automatically; nonetheless, design must honor them to avoid run-time rejection or anomalies.
- Deleting a parent row whose PK is referenced by a child FK is disallowed unless cascading actions or nullification have been predefined.
Relational Algebra & Closure
- Relvar = variable holding a relation; consists of a heading (attribute names) and a body (tuples).
- Closure property: any relational algebra operation produces a new relation that can itself be queried.
Core Unary Operators
- Select (Restrict): horizontal subset—filters rows satisfying a predicate.
- Project: vertical subset—chooses specific columns.
Core Binary Set Operators (require union-compatibility)
- Union: combines rows of two tables, eliminating duplicates.
- Intersect: returns only rows common to both tables.
- Difference: rows in table A that are not in table B.
- Product (Cartesian product): every row of A paired with every row of B—basis for joins.
Join Variants
- Natural join: equates common-named columns, eliminates duplicate column in result.
- Equijoin: explicit equality condition between specified columns; duplicate columns retained.
- Theta join: comparison operator other than equality ( >, <, ≠, etc.).
- Inner join: keeps only matched rows.
- Outer join: keeps matched rows plus unmatched rows padded with nulls.
- Left outer join: all rows from left table retained.
- Right outer join: all rows from right table retained.
Divide Operator
- Dividend: two-column table R(A,B); Divisor: one-column table S(A).
- Result = single-column table containing every B value that pairs with all A values in S.
Data Dictionary & System Catalog
- Data dictionary: designer-maintained metadata repository describing user-defined tables (names, columns, types, validation rules, etc.).
- System catalog: system-maintained data dictionary that stores metadata for all database objects (tables, views, indexes, constraints, triggers, etc.).
- Naming issues:
- Homonym: identical name used for different attributes—creates ambiguity.
- Synonym: different names for the same attribute—causes redundancy/confusion.
Relationships Among Entities
- One-to-Many (1:M): most common; PK of parent appears as FK in child table.
- One-to-One (1:1): rare; often implemented for security or optional attributes.
- Many-to-Many (M:N): resolved via a composite (bridge / associative) entity possessing FKs referencing each parent and usually its own PK.
Controlling Data Redundancy
- Proper PK–FK design minimizes unnecessary duplicate data.
- Acceptable redundancy cases:
- Performance optimization (denormalization for faster queries).
- Historical tracking (storing now-static past values even if duplicated).
- Always weigh redundancy benefits against update anomalies and storage overhead.
Indexes
- An index is an ordered structure that accelerates row access based on key values.
- Index key = attribute list whose values are mapped to physical row pointers.
- Unique index: enforces uniqueness of key values (often auto-created for PKs).
- Each index belongs to one table, but a table can possess multiple indexes (single-column or composite).
Codd’s 12 (+1) Relational Rules (Concise)
- Information: all data represented as values in tables.
- Guaranteed access: every value accessible via combination
- Systematic treatment of nulls: uniform, type-independent handling.
- Dynamic online catalog: metadata stored as relational tables.
- Comprehensive data sublanguage: at least one complete declarative language (e.g., SQL).
- View updating: theoretically updatable views must be updatable in practice.
- High-level set operations: support bulk insert, update, delete.
- Physical data independence: apps unaffected by storage changes.
- Logical data independence: apps unaffected by table structure tweaks (e.g., column order).
- Integrity independence: constraints definable/stored in catalog, not app code.
- Distribution independence: transparency of data location (local vs. distributed).
- Nonsubversion: no bypass of relational integrity via low-level operations.
- Rule Zero: a DBMS is relational only if it uses relational principles exclusively for data management.
Key Takeaways & Practical Implications
- Tables, keys, and constraints form the foundation of the relational paradigm.
- Design goals: maintain integrity, minimize redundancy, optimize access.
- Relational algebra provides the theoretical basis for SQL operations; understanding it aids in query optimization and correctness.
- A robust data dictionary / catalog is indispensable for governance, security, and developer productivity.
- Indexes and thoughtful enforcement of Codd’s rules influence scalability and evolvability of real-world systems.