Fundamentals of Database Systems
FUNDAMENTALS OF DATABASE SYSTEMS
Overview
Title: Fundamentals of Database Systems
Authors: Ramez Elmasri and Shamkant B. Navathe
Edition: 7th Edition
Chapter 2: Database System Concepts and Architecture
Outline
Data Models and Their Categories
History of Data Models
Schemas, Instances, and States
Three-Schema Architecture
Data Independence
DBMS Languages and Interfaces
Database System Utilities and Tools
Centralized and Client-Server Architectures
Classification of DBMSs
Data Models
Definition
Data Model: A set of concepts to describe:
The structure of a database.
The operations for manipulating these structures.
The constraints that the database should obey.
Structure and Constraints
Data Model Structure:
Constructs are defined to represent the database structure.
Constructs include:
Elements (data): These are the basic units containing data types.
Groups of Elements: Such as entities, records, and tables.
Relationships: Connections among groups of elements.
Constraints: Specify restrictions on valid data that must be enforced.
Operations
Data Model Operations: Include basic operations such as:
Insert, delete, update (generic model operations).
User-defined operations like computestudentgpa, update_inventory.
These operations facilitate database retrievals and updates by referring to data model constructs.
Categories of Data Models
Conceptual (high-level, semantic) Data Models:
Provide concepts that align closely with user perceptions of data.
Also known as entity-based or object-based models.
Physical (low-level, internal) Data Models:
Describe how data is stored in the computer, often specified ad-hoc via DBMS manuals.
Implementation (representational) Data Models:
Fall between conceptual and physical models, used in many commercial DBMS implementations (e.g., relational models).
Self-Describing Data Models:
Combine data descriptions with their associated values.
Examples include XML, key-value stores, and some NoSQL systems.
Schemas vs. Instances
Database Schema
Database Schema: Description of a database including:
Structure of the database.
Data types.
Constraints on data.
Schema Diagram: Illustrative display showing aspects of a database schema.
Schema Construct: Components like STUDENT, COURSE that make up the schema.
Database State
Database State: Represents the content of a database at a specific moment in time, including all data.
Also referred to as Database Instance or Occurrence.
The term “instance” is also used in context with individual components, such as record instance and table instance.
Initial and Valid States
Initial Database State: The state of the database upon initial loading.
Valid State: A state that conforms to the structure and constraints of the database.
Distinction:
Schema changes infrequently.
State changes with every update.
Schema referred to as intention; state referred to as extension.
Three-Schema Architecture
Description: A framework for describing DBMS structure, separating user applications from the physical database.
Schema Levels:
Internal Schema: Describes physical storage and access paths (e.g., indexes).
Conceptual Schema: Describes the structure and constraints for the overall database for a community of users.
External Schemas: Describes user views, typically sharing the data model of the conceptual schema.
Mappings:
Required to transform requests and data among schema levels.
Programs reference the external schema, which the DBMS maps to the internal schema for execution.
Data from the internal level is reformatted to suit user views.
Data Independence
Logical Data Independence: Ability to change the conceptual schema without altering the external schemas and applications.
Physical Data Independence: Ability to change the internal schema without modifying the conceptual schema.
Example: Changing internal schema during file structure reorganization or index creation for performance improvement.
Effect of Changes:
When a lower-level schema is altered, only the mappings to higher-level schemas need adjustments, while higher-level schemas and programs remain unchanged.
DBMS Languages
Data Definition Language (DDL)
Utilized by database administrators and designers to specify the conceptual schema.
In many DBMS, it defines internal and external schemas/views.
Separate languages may include Storage Definition Language (SDL) and View Definition Language (VDL).
SDL: Realized via DBMS commands for database administrators (DBA).
Common DDL commands: CREATE, ALTER, DROP.
Data Manipulation Language (DML)
High-Level or Non-procedural Languages:
Examples include SQL (SELECT, INSERT, UPDATE, DELETE); these specify what data to retrieve, not how.
Low-Level or Procedural Languages:
Require embedding within programming languages, such as PL/SQL and JDBC.
Types of DML
High Level: Set-oriented, declarative languages.
Low Level: Procedural, requiring multiple records handling through constructs like loops and pointers.
DBMS Programming Language Interfaces
Embedded Approach: Embedded SQL for languages like C, C++, SQLJ for Java.
Procedure Call Approach: JDBC for Java, ODBC for other APIs.
Database Programming Language: ORACLE's PL/SQL, based on SQL.
Scripting Languages: PHP (client-side) and Python (server-side) for database programs.
User-Friendly DBMS Interfaces
Types:
Menu-based (Web-based)
Forms-based (for casual users)
Graphics-based: Point and Click, Drag and Drop, etc.
Query Specification: Can be done on schema diagrams.
Other DBMS Interfaces
Natural Language: Use text-based queries.
Speech: Voice input for queries and verbal output for responses.
Parametric Interfaces: Function keys for specific functions, such as those used by bank tellers.
Database System Utilities
Functions:
Loading data into a database.
Periodic database backup.
Reorganizing database file structures.
Performance monitoring and report generation utilities.
Additional functions like sorting, user monitoring, and data compression.
DBMS Component Modules
Users:
DBA Staff, Casual Users, Application Programmers, Parametric Users.
Commands:
DDL statements for schema development.
Interactive query commands for database access.
Compiling Modules:
DDL Compiler, Query Compiler, Precompiler, Host Language Compiler.
Execution:
Query Optimizer, Runtime Database System.
Sub-systems:
Concurrency, Backup/Recovery, Data Dictionary.
Centralized and Client-Server Architectures
Centralized DBMS
Combines all DBMS components: software, hardware, application programs, and user interface processing.
Remote Access: Users connect through terminals; all processing occurs centrally.
Client-Server Architecture
Basic 2-Tier Architecture:
Features specialized servers (e.g., print server, file server, DBMS server, web server).
Clients access servers when needed.
Logical 2-Tier Architecture: Displays interaction between multiple client and specialized servers.
DBMS Server
Provides database query and transaction services to clients.
Often referred to as SQL servers or transaction servers.
Applications typically use an API for database interaction via standards like ODBC and JDBC.
Three-Tier Architecture
Description:
Commonly used in web applications.
Consists of an application server (business logic) and web server for data transmission between clients and database servers.
Enhances security by partitioning access between clients and database.
Distributed DBMS Variants
Homogeneous DDBMS: Same DBMS software across the system.
Heterogeneous DDBMS: Different DBMS software across the system.
Federated or Multi-database Systems: Loosely coupled databases providing high autonomy.
Cost Considerations for DBMSs
Cost Range:
Free open-source systems (e.g., MySQL, PostgreSQL) to millions of dollars configurations.
Commercial DBMSs offer specialized modules that can be purchased separately (e.g., time-series, spatial data modules).
Different licensing options available: site licenses, seat licenses, and single-user licenses.
Other Considerations
Access Paths: Fully indexed databases offer keyword access common in search engines.
General vs. Special Purpose:
Special purpose systems, such as OLTP for reservations, are noted.
History of Data Models
Models Discussed
Network Model:
Implemented by Honeywell in 1964-65, widely adopted due to CODASYL support.
Hierarchical Model: Developed by IBM and North American Rockwell; had a significant customer base via IBM's IMS.
Relational Model: Proposed by E.F. Codd in 1970, became the most dominant model for database applications.
Object-oriented Models: Data treated as objects, utilized in OOP languages (e.g., C++, Smalltalk).
Object-Relational Models: Combines features of relational databases with object-oriented programming.
Pros and Cons of Models
Network Model:
Advantages: Models complex relationships well, can handle most situations with record and relationship types.
Disadvantages: Contains complex pointers making it difficult for optimization.
Hierarchical Model:
Advantages: Easy to construct, simple language for operations.
Disadvantages: Navigational constraints limit optimization.
Relational Model:
Dominant in modern applications with robust implementations.
Object-oriented Models:
Focus on treating complex data as objects.
Object-Relational Models: Combines object-oriented features into relational systems, losing prominence in the marketplace.