A data model is a collection of high-level description concepts for describing data. It hides low-level storage details.
It describes the structure of a database and constraints the database should obey.
A schema is a description of a particular collection of data using a given data model.
Conceptual (high-level, semantic) data models
Close to how users perceive data.
Concepts: entities, attributes, relationships.
Examples:
Entity: real-world object like student, lecturer.
Attribute: property of an entity.
Relationship: association between entities (e.g., teaches).
Physical (low-level, internal) data models
Describe how data is stored in the computer.
Implementation (representational) data models
Between conceptual and physical, balancing user view with storage details.
Includes relational, network, and hierarchical models.
ER model is commonly used for database design.
Database design in the ER model is usually converted to the relational model for storage and processing.
Core concepts:
Entities (objects): e.g., customers, accounts, bank branches.
Relationships: associations among entities (e.g., a depositor relationship between customers and accounts).
Relationship sets illustrate how entities interact (e.g., Depositor links customers to accounts).
Widely used for database design; a typical workflow converts ER designs to relational schemas for implementation.
Example schema components (illustrative):
Entities: Customer, Account
Attributes: customer-id, customer-name, customer-street, customer-city, account-number, balance
Relationship: Depositor between Customer and Account
Visual representation shows entities with attributes and lines for relationships.
Key idea: data is stored in relations (tables).
Schema:
\text{RelationName}(field1 : type1 , \dots , fieldn : typen )
Query language: SQL (declarative). Example:
SELECT balance
FROM account
WHERE branch = 'Springfield';
Characteristics:
Programmer specifies what results are needed, not how to execute.
DBMS selects execution strategy.
Provides physical data independence: applications don’t need to know how data is stored.
Structural independence: one can implement various associations (one-to-one, one-to-many, many-to-many) easily in the relational model.
Relational model has powerful and flexible query capabilities.
A sample relational database includes relations such as:
Students(sid: string, name: string, age: integer, gpa: real)
An instance of the relation can be represented as a table with rows (tuples) and columns (attributes).
Example data excerpt (simplified):
customer-id, customer-name, customer-street, customer-city, phone, etc. (illustrative population from the transcript)
Relation: a table with columns (attributes) and rows (tuples).
Attribute: a named column of a relation.
Domain: set of allowable values for an attribute.
Tuple: a row of a relation.
Degree (arity): number of attributes in a relation.
Cardinality: number of tuples in a relation.
Relational Database: a collection of normalized relations with distinct relation names.
Relation name: unique among relation names in the schema.
Atomicity: each cell contains exactly one value.
Attribute names: distinct.
Domain-consistency: values of an attribute come from the same domain.
Tuple distinctness: no duplicate tuples.
Order irrelevance: order of attributes and tuples is not significant (theoretically).
Superkey: a set of attributes that uniquely identifies a tuple within a relation.
Candidate Key: a minimal superkey (no proper subset is a superkey).
Primary Key: chosen candidate key to identify tuples uniquely.
Alternate Keys: candidate keys not chosen as primary key.
Foreign Key: attribute(s) in one relation that reference a candidate key of another relation.
NULL values represent missing or inapplicable data.
Meanings:
Value unknown.
Value exists but is not available.
Attribute does not apply to this tuple (value undefined).
NULLs are not the same as zero or spaces.
Entity Integrity (primary key): no primary key value can be NULL.
Referential Integrity (foreign key): FK values must reference an existing PK value in the referenced relation, or be NULL (with caveats).
Domain constraints: data types, ranges, and formats enforced by the domain.
Semantic (business) integrity constraints: application-specific rules that may require a constraint language to express (e.g., max hours per employee on a project is 56 hours per week).
Example schema (COMPANY): includes EMPLOYEE, DEPARTMENT, PROJECT, WORKS_ON, DEPENDENT, etc., with keys and relationships depicted.
Visuals in the transcript show underlined primary keys and arrows for referential integrity constraints.
Example attributes and keys include SSN (primary key for EMPLOYEE), DNUMBER (for DEPARTMENT), PNUMBER (for PROJECT), ESSN (FK in WORKS_ON referencing EMPLOYEE.SSN), etc.
Update operations: INSERT, DELETE, MODIFY a tuple.
Integrity constraints must not be violated by updates.
Updates may be grouped; changes may cascade to maintain integrity (CASCADE, SET NULL) or trigger additional updates.
If an integrity violation would occur, possible actions include:
Reject the operation.
Proceed with the operation but raise a violation warning.
Trigger cascading updates to restore consistency.
Execute a user-defined error-correction routine.
Database Schema (intension): descriptions of the database structure and constraints.
Schema Diagram: a diagrammatic display of aspects of a database schema, including names of relations and some constraints.
Database Instance/State (extension): the actual data at a moment in time.
A new database starts empty (no data); initial state includes initial data.
Schema changes infrequently; instance/state changes on updates.
Levels:
External level (Views): multiple views tailored to user groups.
Conceptual level (Logical): defines the community-wide structure and constraints.
Internal level (Physical): describes storage structures and access paths.
A single conceptual schema, multiple external schemas, and potentially multiple internal schemas exist.
External schemas provide customized access per user group (views).
Conceptual schema describes the entire database structure for a community of users.
Internal schema describes how data is stored physically, including files and indexes.
Conceptual schema design involves determining relations (entities), attributes, data types, relationships, and constraints.
Internal schema specifies physical storage, indexes, and access paths; decisions depend on data access patterns and DBMS capabilities.
The external schema refines the conceptual schema to meet user needs and security requirements.
Data Independence:
Logical data independence: ability to change the conceptual schema without changing external schemas or applications.
Physical data independence: ability to change the internal schema without changing the conceptual schema.
When a lower-level schema changes, mappings to higher-level schemas change, while higher-level schemas remain unchanged.
High-level (non-procedural) languages: e.g., SQL. Set-oriented; specify what data to retrieve, not how.
Low-level (procedural) languages: record-at-a-time; specify how to retrieve data; include loops and other control structures.
DBMS components:
Storage Manager: interface between DBMS and physical data stored in files.
Query Processor: compiles and evaluates queries; includes DDL interpreter, DML compiler, and the query evaluation engine.
Responsibilities:
Interface between low-level data storage and applications/queries.
Translates DML statements into file operations.
Stores, retrieves, and updates data.
Data structures managed: data files, data dictionary (metadata), indices for fast access.
Components:
DDL Interpreter: processes DDL statements and updates the data dictionary.
DML Compiler: translates DML into an evaluation plan understood by the query engine.
Query Evaluation Engine: executes the low-level instructions.
Two-tier (client-server) architecture:
Tier 1: Client – manages user interface and applications.
Tier 2: Database server – holds database and DBMS; performs server-side validation and data access.
Advantages of two-tier:
Wider access to databases.
Increased performance and potential cost reductions.
Reduced client hardware costs and communication costs.
Improved consistency.
Evolution to three layers to address scalability:
Tier 1: Client – user interface.
Tier 2: Application server – business logic and data processing logic.
Tier 3: Database server – data validation and database access.
Advantages of three-tier:
Client can have lighter hardware.
Centralized application maintenance.
Easier modification or replacement of one tier with minimal impact on others.
Better support for load balancing and web environments.
Relational model provides a flexible, scalable basis for modern database systems used across industries.
Separation of concerns (schema vs data, external vs conceptual vs internal) enables evolution: you can change how data is stored without breaking applications.
Integrity constraints and normalization principles protect data quality and consistency in real-world systems.
Understanding keys, constraints, and update propagation is essential for reliable database design and maintenance.
Data model, schema, instance/state, intension vs extension.
ER model, entities, relationships, relationship sets.
Relational model: relations, attributes, domains, tuples, keys, and constraints.
Keys: superkey, candidate key, primary key, alternate keys, foreign key.
NULL values and their meanings.
Integrity constraints: entity, referential, domain, semantic.
Update operations and cascading effects.
Schema diagrams and example schemas (e.g., COMPANY).
Three-schema architecture (external/conceptual/internal) and data independence.
DBMS architecture: storage manager, query processor, and their subcomponents.
Client-server architectures: two-tier and three-tier, with associated advantages.
Declarative vs procedural languages; SQL as an example of a high-level language.
Examples of relational schemas and relations from the transcript (e.g., EMPLOYEE, DEPARTMENT, PROJECT, WORKS_ON) to illustrate keys and constraints.
Note: An explicit example mentioned in the material is the constraint: the max hours per employee on all projects is 56 hours per week, illustrating semantic integrity constraints.