Fundamentals of Database Systems

FUNDAMENTALS OF DATABASE SYSTEMS

Overview

  • Title: Fundamentals of Database Systems

  • Authors: Ramez Elmasri and Shamkant B. Navathe

  • Edition: 7th Edition

Chapter 2: Database System Concepts and Architecture

Outline
  • Data Models and Their Categories

  • History of Data Models

  • Schemas, Instances, and States

  • Three-Schema Architecture

  • Data Independence

  • DBMS Languages and Interfaces

  • Database System Utilities and Tools

  • Centralized and Client-Server Architectures

  • Classification of DBMSs

Data Models

Definition
  • Data Model: A set of concepts to describe:

    • The structure of a database.

    • The operations for manipulating these structures.

    • The constraints that the database should obey.

Structure and Constraints
  • Data Model Structure:

    • Constructs are defined to represent the database structure.

    • Constructs include:

    • Elements (data): These are the basic units containing data types.

    • Groups of Elements: Such as entities, records, and tables.

    • Relationships: Connections among groups of elements.

  • Constraints: Specify restrictions on valid data that must be enforced.

Operations
  • Data Model Operations: Include basic operations such as:

    • Insert, delete, update (generic model operations).

    • User-defined operations like computestudentgpa, update_inventory.

  • These operations facilitate database retrievals and updates by referring to data model constructs.

Categories of Data Models

  • Conceptual (high-level, semantic) Data Models:

    • Provide concepts that align closely with user perceptions of data.

    • Also known as entity-based or object-based models.

  • Physical (low-level, internal) Data Models:

    • Describe how data is stored in the computer, often specified ad-hoc via DBMS manuals.

  • Implementation (representational) Data Models:

    • Fall between conceptual and physical models, used in many commercial DBMS implementations (e.g., relational models).

  • Self-Describing Data Models:

    • Combine data descriptions with their associated values.

    • Examples include XML, key-value stores, and some NoSQL systems.

Schemas vs. Instances

Database Schema
  • Database Schema: Description of a database including:

    • Structure of the database.

    • Data types.

    • Constraints on data.

  • Schema Diagram: Illustrative display showing aspects of a database schema.

  • Schema Construct: Components like STUDENT, COURSE that make up the schema.

Database State
  • Database State: Represents the content of a database at a specific moment in time, including all data.

  • Also referred to as Database Instance or Occurrence.

  • The term “instance” is also used in context with individual components, such as record instance and table instance.

Initial and Valid States
  • Initial Database State: The state of the database upon initial loading.

  • Valid State: A state that conforms to the structure and constraints of the database.

  • Distinction:

    • Schema changes infrequently.

    • State changes with every update.

    • Schema referred to as intention; state referred to as extension.

Three-Schema Architecture

  • Description: A framework for describing DBMS structure, separating user applications from the physical database.

  • Schema Levels:

    • Internal Schema: Describes physical storage and access paths (e.g., indexes).

    • Conceptual Schema: Describes the structure and constraints for the overall database for a community of users.

    • External Schemas: Describes user views, typically sharing the data model of the conceptual schema.

  • Mappings:

    • Required to transform requests and data among schema levels.

    • Programs reference the external schema, which the DBMS maps to the internal schema for execution.

    • Data from the internal level is reformatted to suit user views.

Data Independence

  • Logical Data Independence: Ability to change the conceptual schema without altering the external schemas and applications.

  • Physical Data Independence: Ability to change the internal schema without modifying the conceptual schema.

    • Example: Changing internal schema during file structure reorganization or index creation for performance improvement.

  • Effect of Changes:

    • When a lower-level schema is altered, only the mappings to higher-level schemas need adjustments, while higher-level schemas and programs remain unchanged.

DBMS Languages

Data Definition Language (DDL)
  • Utilized by database administrators and designers to specify the conceptual schema.

  • In many DBMS, it defines internal and external schemas/views.

  • Separate languages may include Storage Definition Language (SDL) and View Definition Language (VDL).

    • SDL: Realized via DBMS commands for database administrators (DBA).

    • Common DDL commands: CREATE, ALTER, DROP.

Data Manipulation Language (DML)
  • High-Level or Non-procedural Languages:

    • Examples include SQL (SELECT, INSERT, UPDATE, DELETE); these specify what data to retrieve, not how.

  • Low-Level or Procedural Languages:

    • Require embedding within programming languages, such as PL/SQL and JDBC.

Types of DML
  • High Level: Set-oriented, declarative languages.

  • Low Level: Procedural, requiring multiple records handling through constructs like loops and pointers.

DBMS Programming Language Interfaces
  • Embedded Approach: Embedded SQL for languages like C, C++, SQLJ for Java.

  • Procedure Call Approach: JDBC for Java, ODBC for other APIs.

  • Database Programming Language: ORACLE's PL/SQL, based on SQL.

  • Scripting Languages: PHP (client-side) and Python (server-side) for database programs.

User-Friendly DBMS Interfaces
  • Types:

    • Menu-based (Web-based)

    • Forms-based (for casual users)

    • Graphics-based: Point and Click, Drag and Drop, etc.

    • Query Specification: Can be done on schema diagrams.

Other DBMS Interfaces
  • Natural Language: Use text-based queries.

  • Speech: Voice input for queries and verbal output for responses.

  • Parametric Interfaces: Function keys for specific functions, such as those used by bank tellers.

Database System Utilities

  • Functions:

    • Loading data into a database.

    • Periodic database backup.

    • Reorganizing database file structures.

    • Performance monitoring and report generation utilities.

    • Additional functions like sorting, user monitoring, and data compression.

DBMS Component Modules

  • Users:

    • DBA Staff, Casual Users, Application Programmers, Parametric Users.

  • Commands:

    • DDL statements for schema development.

    • Interactive query commands for database access.

  • Compiling Modules:

    • DDL Compiler, Query Compiler, Precompiler, Host Language Compiler.

  • Execution:

    • Query Optimizer, Runtime Database System.

  • Sub-systems:

    • Concurrency, Backup/Recovery, Data Dictionary.

Centralized and Client-Server Architectures

Centralized DBMS
  • Combines all DBMS components: software, hardware, application programs, and user interface processing.

  • Remote Access: Users connect through terminals; all processing occurs centrally.

Client-Server Architecture
  • Basic 2-Tier Architecture:

    • Features specialized servers (e.g., print server, file server, DBMS server, web server).

    • Clients access servers when needed.

  • Logical 2-Tier Architecture: Displays interaction between multiple client and specialized servers.

DBMS Server

  • Provides database query and transaction services to clients.

  • Often referred to as SQL servers or transaction servers.

  • Applications typically use an API for database interaction via standards like ODBC and JDBC.

Three-Tier Architecture
  • Description:

    • Commonly used in web applications.

    • Consists of an application server (business logic) and web server for data transmission between clients and database servers.

    • Enhances security by partitioning access between clients and database.

Distributed DBMS Variants

  • Homogeneous DDBMS: Same DBMS software across the system.

  • Heterogeneous DDBMS: Different DBMS software across the system.

  • Federated or Multi-database Systems: Loosely coupled databases providing high autonomy.

Cost Considerations for DBMSs

  • Cost Range:

    • Free open-source systems (e.g., MySQL, PostgreSQL) to millions of dollars configurations.

  • Commercial DBMSs offer specialized modules that can be purchased separately (e.g., time-series, spatial data modules).

  • Different licensing options available: site licenses, seat licenses, and single-user licenses.

Other Considerations
  • Access Paths: Fully indexed databases offer keyword access common in search engines.

  • General vs. Special Purpose:

    • Special purpose systems, such as OLTP for reservations, are noted.

History of Data Models

Models Discussed
  • Network Model:

    • Implemented by Honeywell in 1964-65, widely adopted due to CODASYL support.

  • Hierarchical Model: Developed by IBM and North American Rockwell; had a significant customer base via IBM's IMS.

  • Relational Model: Proposed by E.F. Codd in 1970, became the most dominant model for database applications.

  • Object-oriented Models: Data treated as objects, utilized in OOP languages (e.g., C++, Smalltalk).

  • Object-Relational Models: Combines features of relational databases with object-oriented programming.

Pros and Cons of Models
  • Network Model:

    • Advantages: Models complex relationships well, can handle most situations with record and relationship types.

    • Disadvantages: Contains complex pointers making it difficult for optimization.

  • Hierarchical Model:

    • Advantages: Easy to construct, simple language for operations.

    • Disadvantages: Navigational constraints limit optimization.

  • Relational Model:

    • Dominant in modern applications with robust implementations.

  • Object-oriented Models:

    • Focus on treating complex data as objects.

  • Object-Relational Models: Combines object-oriented features into relational systems, losing prominence in the marketplace.