Chapter 8 – Relational Database Model

Learning Objectives

By the end of Chapter 8 you should be able to:
- Describe the logical structure of the relational model and explain how it differs from physical storage structures.
- List and discuss the basic components of the model (tables, rows, columns, keys, relationships, constraints).
- Manipulate table contents through relational operators (select, project, join, union, etc.).
- Explain the roles of the data dictionary and system catalog in metadata management.
- Identify entities, establish relationships, and translate business rules into the relational schema.
- Evaluate and control data redundancy through proper key usage and table design.
- Explain indexing, including index keys, unique indexes, and their impact on performance and integrity.

A Logical View of Data

The relational model presents data logically rather than physically.
Core logical construct is the relation (i.e., table).
Logical simplicity supports disciplined design methodologies and conceptual independence from storage concerns.

Relational Tables: Fundamental Characteristics

A table is perceived as a two-dimensional grid (rows × columns).
Each row (tuple) = one occurrence of an entity.
Each column = one attribute; each column name is unique within its table.
Intersection of row & column = one single data value (atomicity).
All values in the same column must share a common data format / domain.
Each column is defined over an attribute domain (allowable value range).
Ordering of rows or columns is immaterial to the DBMS.
Every table has at least one attribute (or set) that uniquely identifies each row (primary key).

Keys & Dependencies

Key = attribute(s) that determine other attribute values and ensure unique identification.
Determination: Knowing value of one attribute lets you determine another.
- Attribute providing the value = determinant.
- Attribute whose value is obtained = dependent.
Functional dependence: Attribute $X$ functionally determines $Y$ if each $X$ value is associated with precisely one $Y$ value.
Full functional dependence: The entire composite determinant is necessary; no subset of the determinant attributes suffices.

Types of Keys

Composite key: two or more attributes combined to serve as a key.
Key attribute: any attribute that forms part of a key.
Superkey: any attribute set that uniquely identifies rows (may contain unnecessary attributes).
Candidate key: minimal (irreducible) superkey—no proper subset is a superkey.
Primary key (PK): candidate key selected to identify rows; cannot contain nulls.
Entity integrity rule: every table row must have a non-null and unique PK.
Foreign key (FK): PK from one table placed in another to represent a relationship.
Referential integrity rule: every FK is either null (when allowed) or matches an existing PK in the referenced table.
Secondary key: attribute(s) used strictly for data retrieval (e.g., search, grouping) rather than row identity.
Null represents the absence of a value; can mean unknown, not yet available, or inapplicable.

Handling Nulls & Column Constraints

NOT NULL constraint: enforces that each row supplies a value for the column.
UNIQUE constraint: prevents duplicate values within the column.
Flags: special codes stored in a column to explicitly denote missing/exception conditions.

Integrity Rules in Practice

RDBMSs enforce entity & referential integrity automatically; nonetheless, design must honor them to avoid run-time rejection or anomalies.
Deleting a parent row whose PK is referenced by a child FK is disallowed unless cascading actions or nullification have been predefined.

Relational Algebra & Closure

Relvar = variable holding a relation; consists of a heading (attribute names) and a body (tuples).
Closure property: any relational algebra operation produces a new relation that can itself be queried.

Core Unary Operators

Select (Restrict): horizontal subset—filters rows satisfying a predicate.
Project: vertical subset—chooses specific columns.

Core Binary Set Operators (require union-compatibility)

Union: combines rows of two tables, eliminating duplicates.
Intersect: returns only rows common to both tables.
Difference: rows in table A that are not in table B.
Product (Cartesian product): every row of A paired with every row of B—basis for joins.

Join Variants

Natural join: equates common-named columns, eliminates duplicate column in result.
Equijoin: explicit equality condition between specified columns; duplicate columns retained.
Theta join: comparison operator other than equality ( >, <, ≠, etc.).
Inner join: keeps only matched rows.
Outer join: keeps matched rows plus unmatched rows padded with nulls.
- Left outer join: all rows from left table retained.
- Right outer join: all rows from right table retained.

Divide Operator

Dividend: two-column table $R(A,B)$ ; Divisor: one-column table $S(A)$ .
Result = single-column table containing every $B$ value that pairs with all $A$ values in $S$ .

Data Dictionary & System Catalog

Data dictionary: designer-maintained metadata repository describing user-defined tables (names, columns, types, validation rules, etc.).
System catalog: system-maintained data dictionary that stores metadata for all database objects (tables, views, indexes, constraints, triggers, etc.).
Naming issues:
- Homonym: identical name used for different attributes—creates ambiguity.
- Synonym: different names for the same attribute—causes redundancy/confusion.

Relationships Among Entities

One-to-Many ( $1{:}M$ ): most common; PK of parent appears as FK in child table.
One-to-One ( $1{:}1$ ): rare; often implemented for security or optional attributes.
Many-to-Many ( $M{:}N$ ): resolved via a composite (bridge / associative) entity possessing FKs referencing each parent and usually its own PK.

Controlling Data Redundancy

Proper PK–FK design minimizes unnecessary duplicate data.
Acceptable redundancy cases:
- Performance optimization (denormalization for faster queries).
- Historical tracking (storing now-static past values even if duplicated).
Always weigh redundancy benefits against update anomalies and storage overhead.

Indexes

An index is an ordered structure that accelerates row access based on key values.
Index key = attribute list whose values are mapped to physical row pointers.
Unique index: enforces uniqueness of key values (often auto-created for PKs).
Each index belongs to one table, but a table can possess multiple indexes (single-column or composite).

Codd’s 12 (+1) Relational Rules (Concise)

Information: all data represented as values in tables.
Guaranteed access: every value accessible via combination .
Systematic treatment of nulls: uniform, type-independent handling.
Dynamic online catalog: metadata stored as relational tables.
Comprehensive data sublanguage: at least one complete declarative language (e.g., SQL).
View updating: theoretically updatable views must be updatable in practice.
High-level set operations: support bulk insert, update, delete.
Physical data independence: apps unaffected by storage changes.
Logical data independence: apps unaffected by table structure tweaks (e.g., column order).
Integrity independence: constraints definable/stored in catalog, not app code.
Distribution independence: transparency of data location (local vs. distributed).
Nonsubversion: no bypass of relational integrity via low-level operations.
Rule Zero: a DBMS is relational only if it uses relational principles exclusively for data management.

Key Takeaways & Practical Implications

Tables, keys, and constraints form the foundation of the relational paradigm.
Design goals: maintain integrity, minimize redundancy, optimize access.
Relational algebra provides the theoretical basis for SQL operations; understanding it aids in query optimization and correctness.
A robust data dictionary / catalog is indispensable for governance, security, and developer productivity.
Indexes and thoughtful enforcement of Codd’s rules influence scalability and evolvability of real-world systems.