V

Lecture2 - Relational Models and Database Schema

Data Models Overview

  • A data model is a collection of high-level description concepts for describing data. It hides low-level storage details.

  • It describes the structure of a database and constraints the database should obey.

  • A schema is a description of a particular collection of data using a given data model.

Data Model Categories
  • Conceptual (high-level, semantic) data models

    • Close to how users perceive data.

    • Concepts: entities, attributes, relationships.

    • Examples:

    • Entity: real-world object like student, lecturer.

    • Attribute: property of an entity.

    • Relationship: association between entities (e.g., teaches).

  • Physical (low-level, internal) data models

    • Describe how data is stored in the computer.

  • Implementation (representational) data models

    • Between conceptual and physical, balancing user view with storage details.

    • Includes relational, network, and hierarchical models.

Data Models in Practice
  • ER model is commonly used for database design.

  • Database design in the ER model is usually converted to the relational model for storage and processing.


Entity-Relationship (ER) Model

  • Core concepts:

    • Entities (objects): e.g., customers, accounts, bank branches.

    • Relationships: associations among entities (e.g., a depositor relationship between customers and accounts).

  • Relationship sets illustrate how entities interact (e.g., Depositor links customers to accounts).

  • Widely used for database design; a typical workflow converts ER designs to relational schemas for implementation.

ER Model Example (from the transcript)
  • Example schema components (illustrative):

    • Entities: Customer, Account

    • Attributes: customer-id, customer-name, customer-street, customer-city, account-number, balance

    • Relationship: Depositor between Customer and Account

  • Visual representation shows entities with attributes and lines for relationships.


The Relational Data Model

  • Key idea: data is stored in relations (tables).

  • Schema:
    \text{RelationName}(field1 : type1 , \dots , fieldn : typen )

  • Query language: SQL (declarative). Example:

  SELECT balance
  FROM account
  WHERE branch = 'Springfield';
  • Characteristics:

    • Programmer specifies what results are needed, not how to execute.

    • DBMS selects execution strategy.

    • Provides physical data independence: applications don’t need to know how data is stored.

Relational Model: Structural Independence and Query Power
  • Structural independence: one can implement various associations (one-to-one, one-to-many, many-to-many) easily in the relational model.

  • Relational model has powerful and flexible query capabilities.

Relational Model Example 2
  • A sample relational database includes relations such as:

    • Students(sid: string, name: string, age: integer, gpa: real)

  • An instance of the relation can be represented as a table with rows (tuples) and columns (attributes).

  • Example data excerpt (simplified):

    • customer-id, customer-name, customer-street, customer-city, phone, etc. (illustrative population from the transcript)


Relational Model Terminology

  • Relation: a table with columns (attributes) and rows (tuples).

  • Attribute: a named column of a relation.

  • Domain: set of allowable values for an attribute.

  • Tuple: a row of a relation.

  • Degree (arity): number of attributes in a relation.

  • Cardinality: number of tuples in a relation.

  • Relational Database: a collection of normalized relations with distinct relation names.

  • Relation name: unique among relation names in the schema.

  • Atomicity: each cell contains exactly one value.

  • Attribute names: distinct.

  • Domain-consistency: values of an attribute come from the same domain.

  • Tuple distinctness: no duplicate tuples.

  • Order irrelevance: order of attributes and tuples is not significant (theoretically).


Relational Keys

  • Superkey: a set of attributes that uniquely identifies a tuple within a relation.

  • Candidate Key: a minimal superkey (no proper subset is a superkey).

  • Primary Key: chosen candidate key to identify tuples uniquely.

  • Alternate Keys: candidate keys not chosen as primary key.

  • Foreign Key: attribute(s) in one relation that reference a candidate key of another relation.


Nulls and Integrity

  • NULL values represent missing or inapplicable data.

    • Meanings:

    • Value unknown.

    • Value exists but is not available.

    • Attribute does not apply to this tuple (value undefined).

  • NULLs are not the same as zero or spaces.


Database Constraints

  • Entity Integrity (primary key): no primary key value can be NULL.

  • Referential Integrity (foreign key): FK values must reference an existing PK value in the referenced relation, or be NULL (with caveats).

  • Domain constraints: data types, ranges, and formats enforced by the domain.

  • Semantic (business) integrity constraints: application-specific rules that may require a constraint language to express (e.g., max hours per employee on a project is 56 hours per week).


Examples of Schema and Relations

  • Example schema (COMPANY): includes EMPLOYEE, DEPARTMENT, PROJECT, WORKS_ON, DEPENDENT, etc., with keys and relationships depicted.

  • Visuals in the transcript show underlined primary keys and arrows for referential integrity constraints.

  • Example attributes and keys include SSN (primary key for EMPLOYEE), DNUMBER (for DEPARTMENT), PNUMBER (for PROJECT), ESSN (FK in WORKS_ON referencing EMPLOYEE.SSN), etc.


Updating Relations and Integrity

  • Update operations: INSERT, DELETE, MODIFY a tuple.

  • Integrity constraints must not be violated by updates.

  • Updates may be grouped; changes may cascade to maintain integrity (CASCADE, SET NULL) or trigger additional updates.

Update Scenarios
  • If an integrity violation would occur, possible actions include:

    • Reject the operation.

    • Proceed with the operation but raise a violation warning.

    • Trigger cascading updates to restore consistency.

    • Execute a user-defined error-correction routine.


Schema, Schema Diagrams, and Instances

  • Database Schema (intension): descriptions of the database structure and constraints.

  • Schema Diagram: a diagrammatic display of aspects of a database schema, including names of relations and some constraints.

  • Database Instance/State (extension): the actual data at a moment in time.

  • A new database starts empty (no data); initial state includes initial data.

  • Schema changes infrequently; instance/state changes on updates.


Three-Schema Architecture

  • Levels:

    • External level (Views): multiple views tailored to user groups.

    • Conceptual level (Logical): defines the community-wide structure and constraints.

    • Internal level (Physical): describes storage structures and access paths.

  • A single conceptual schema, multiple external schemas, and potentially multiple internal schemas exist.

External, Conceptual, and Internal Schemas
  • External schemas provide customized access per user group (views).

  • Conceptual schema describes the entire database structure for a community of users.

  • Internal schema describes how data is stored physically, including files and indexes.


Conceptual Schema Design and Internal Details

  • Conceptual schema design involves determining relations (entities), attributes, data types, relationships, and constraints.

  • Internal schema specifies physical storage, indexes, and access paths; decisions depend on data access patterns and DBMS capabilities.

  • The external schema refines the conceptual schema to meet user needs and security requirements.


Data Independence

  • Data Independence:

    • Logical data independence: ability to change the conceptual schema without changing external schemas or applications.

    • Physical data independence: ability to change the internal schema without changing the conceptual schema.

  • When a lower-level schema changes, mappings to higher-level schemas change, while higher-level schemas remain unchanged.


DBMS Languages and Processing

  • High-level (non-procedural) languages: e.g., SQL. Set-oriented; specify what data to retrieve, not how.

  • Low-level (procedural) languages: record-at-a-time; specify how to retrieve data; include loops and other control structures.

  • DBMS components:

    • Storage Manager: interface between DBMS and physical data stored in files.

    • Query Processor: compiles and evaluates queries; includes DDL interpreter, DML compiler, and the query evaluation engine.

Storage Manager Details
  • Responsibilities:

    • Interface between low-level data storage and applications/queries.

    • Translates DML statements into file operations.

    • Stores, retrieves, and updates data.

  • Data structures managed: data files, data dictionary (metadata), indices for fast access.

Query Processor Details
  • Components:

    • DDL Interpreter: processes DDL statements and updates the data dictionary.

    • DML Compiler: translates DML into an evaluation plan understood by the query engine.

    • Query Evaluation Engine: executes the low-level instructions.


Client-Server Architectures

  • Two-tier (client-server) architecture:

    • Tier 1: Client – manages user interface and applications.

    • Tier 2: Database server – holds database and DBMS; performs server-side validation and data access.

  • Advantages of two-tier:

    • Wider access to databases.

    • Increased performance and potential cost reductions.

    • Reduced client hardware costs and communication costs.

    • Improved consistency.

Three-Tier Client Architecture
  • Evolution to three layers to address scalability:

    • Tier 1: Client – user interface.

    • Tier 2: Application server – business logic and data processing logic.

    • Tier 3: Database server – data validation and database access.

  • Advantages of three-tier:

    • Client can have lighter hardware.

    • Centralized application maintenance.

    • Easier modification or replacement of one tier with minimal impact on others.

    • Better support for load balancing and web environments.


Practical Connections and Real-World Relevance

  • Relational model provides a flexible, scalable basis for modern database systems used across industries.

  • Separation of concerns (schema vs data, external vs conceptual vs internal) enables evolution: you can change how data is stored without breaking applications.

  • Integrity constraints and normalization principles protect data quality and consistency in real-world systems.

  • Understanding keys, constraints, and update propagation is essential for reliable database design and maintenance.


Quick Reference Concepts (Recap)

  • Data model, schema, instance/state, intension vs extension.

  • ER model, entities, relationships, relationship sets.

  • Relational model: relations, attributes, domains, tuples, keys, and constraints.

  • Keys: superkey, candidate key, primary key, alternate keys, foreign key.

  • NULL values and their meanings.

  • Integrity constraints: entity, referential, domain, semantic.

  • Update operations and cascading effects.

  • Schema diagrams and example schemas (e.g., COMPANY).

  • Three-schema architecture (external/conceptual/internal) and data independence.

  • DBMS architecture: storage manager, query processor, and their subcomponents.

  • Client-server architectures: two-tier and three-tier, with associated advantages.

  • Declarative vs procedural languages; SQL as an example of a high-level language.

  • Examples of relational schemas and relations from the transcript (e.g., EMPLOYEE, DEPARTMENT, PROJECT, WORKS_ON) to illustrate keys and constraints.

Note: An explicit example mentioned in the material is the constraint: the max hours per employee on all projects is 56 hours per week, illustrating semantic integrity constraints.