Database Management Systems: Chapter 1 Study Notes

CHAPTER 1: Introduction

  • A database-management system (DBMS) is a collection of interrelated data and a set of programs to access those data.

    • The collection of data, referred to as the database, contains information relevant to an enterprise.
    • Primary goal of a DBMS: To provide a way to store and retrieve database information that is both convenient and efficient.
  • Management of Data:

    • Involves defining structures for storage of information and providing mechanisms for manipulation of information.
    • Must ensure safety of information stored despite system crashes or unauthorized access.
    • If data is shared among several users, the system must avoid anomalous results.
  • The importance of information in organizations has led to a large body of concepts and techniques for managing data. These concepts and techniques are the focus of this book.

1.1 Database-System Applications

  • The earliest database systems emerged in the 1960s to manage commercial data. Earlier applications were relatively simple compared to modern database applications.

  • Modern applications: Include complex, worldwide enterprises. The central aspect of any application is the data themselves, not the program performing calculations.

  • Value of corporations today: Often comes from the information they own, rather than physical assets, e.g., a bank without customer data loses nearly all its value.

Characteristics of Database Systems:
  • Manage collections of data that:

    • Are highly valuable,
    • Are relatively large,
    • Are accessed by multiple users and applications, often simultaneously.
  • Types of Data in Database Applications:

    • Structured Data: Data with a standard repeating structure, e.g., university records for students and courses.
    • Example:
      • Course records contain course-identifier, title, department, course number.
      • Student records contain student-identifier, name, address, phone.
      • Course registration is a collection of pairs: one course identifier and one student identifier.
    • Complex Data: Modern applications like social networks handle highly variable data including text, images, videos, etc.
    • Both structured and weakly structured data support basic features of a database.
Managing Complexity
  • Abstraction: Enables users to operate complex systems without needing to understand intricate details.
    • Example: A person can drive a car by knowing how to use its controls, not how the engine functions.
  • Database systems abstract data complexity to provide unified repositories of information necessary for enterprise operations.
Representative Applications of Database Systems:
  • Enterprise Information:

    • Sales, Accounting, Human Resources.
  • Manufacturing:

    • Management of supply chain, production tracking, inventory management.
  • Banking and Finance:

    • Customer information, accounts, transactions.
  • Universities:

    • Student data, course registrations, grades.
  • Airlines and Telecommunication:

    • Reservations, communication records, billing.
  • Web Services:

    • Social media, online retailers, advertisements.
  • Document Databases: Articles, patents, research papers.

  • Navigation Systems:

    • Location tracking and route management.
  • The interaction between users and databases has changed. Early databases were back-office systems, evolving to support user interfaces for queries and data updates.

    • Modern Interaction: Web applications and mobile applications allow direct customer interaction with databases.
Database Modes of Use:
  • Online Transaction Processing: Supports multiple users accessing small data amounts and performing small updates.
  • Data Analytics: Involves processing data to infer conclusions or rules used to influence business decisions.

1.2 Purpose of Database Systems

  • A university organization keeps data about students, instructors, departments, course offerings.

  • Storing information in operating system files introduces issues:

    • Data Redundancy and Inconsistency: Different programmers may create files leading to redundancy and inconsistency due to updates not reflected across files.
    • Difficulty in Access: Lack of adequate programs to meet data retrieval queries can lead to inefficiency.
    • Data Isolation: Scattered storage in assorted files makes retrieving data challenging.
    • Integrity Problems: Ensuring consistency constraints is complex and error-prone when using multiple application programs.
    • Atomicity Problems: A system failure during a multi-step program (e.g., fund transfers) can lead to inconsistent database states.
    • Concurrent-Access Anomalies: Simultaneous updates can produce inconsistent data if not properly managed.
    • Security Problems: Ensuring the right users access appropriate data adds to the complexity of maintaining a file-processing system.
  • The above problems contributed to the development of database systems during the 1960s and 1970s.

  • The chapter will present concepts enabling database systems to overcome file-processing system limitations.

1.3 View of Data

  • A database system comprises interrelated data and programs for data access and modification, providing an abstract view of the data.
1.3.1 Data Models:
  • The data model forms the basis of the database. It comprises conceptual tools for describing data, relationships, semantics, and constraints.
  • Categories of Data Models:
    • Relational Model: Uses tables to represent data and their relationships. Each table consists of records corresponding to attributes. It is a record-based model widely employed in current database systems.
    • Entity-Relationship Model (E-R): Consists of entities and relationships used commonly in database design.
    • Semi-Structured Data Model: Allows variations in attributes for the same data type; exemplified by JSON and XML.
    • Object-Based Data Model: Integrates object-oriented programming concepts into relational databases, supporting methods and encapsulation.
1.3.2 Relational Data Model:
  • Data is represented through tables with multiple columns, each having unique names.
  • A relational database includes several tables. Example: Tables for instructors and departments demonstrate relationships through shared attributes.
1.3.3 Data Abstraction:
  • To streamline user interaction, database systems implement several levels of data abstraction:
    • Physical Level: Lowest abstraction level describing actual storage.
    • Logical Level: Details what data are stored and relationships among them.
    • View Level: Highest level focusing on parts of the database needed by users, hiding complex details.
1.3.4 Instances and Schemas:
  • An instance is the data in a database at any given moment, while the schema is the overall database design.
  • Understanding schemas vs instances is analogous to programming variable declarations and current values.
    • Database systems may explore multiple schemas, each revealing different abstraction levels.

1.4 Database Languages

  • Database systems provide a data-definition language (DDL) for schema specification and a data-manipulation language (DML) for queries and updates.
1.4.1 Data-Definition Language (DDL):
  • The DDL specifies schemas and additional data properties. It permits defining storage structures and constraints that must be satisfied during data updates.

  • **Types of Constraints:

    • Domain Constraints: Define permissible values for attributes.
    • Referential Integrity: Ensures values in one relation are valid in another.
    • Authorization: Specifies access permissions for users.
  • Output of DDL is stored in the data dictionary. It holds metadata about the database.

1.4.2 SQL Data-Definition Language:
  • SQL provides a rich DDL to define tables and constraints. Example statement for creating department table:
create table department (dept name char(20), building char(15), budget numeric(12,2));
1.4.3 Data-Manipulation Language (DML):
  • DML enables data access or manipulation, facilitating four primary functions: retrieval, insertion, deletion, and modification of data.
  • Types of DML:
    • Procedural DMLs: Users specify what data are needed and how to access them.
    • Declarative DMLs: Users only state what data they need without outlining how to retrieve them.
1.4.4 SQL Data-Manipulation Language:
  • SQL is nonprocedural, returning results in table form. Example SQL query to retrieve names of History department instructors:
select instructor.name from instructor where instructor.dept name = 'History';
1.4.5 Database Access from Application Programs:
  • Application programs serve as intermediaries to execute DML statements and provide user interaction.
  • Access to database involves communication through APIs such as ODBC for C and JDBC for Java.

1.5 Database Design

  • Database systems are designed to manage vast amounts of data essential for enterprise operations.
  • Database Schema Design: Primarily focuses on the feedback from users defining the data requirements and relationships needed for efficient data handling.
Phases of Database Design:
  • Specification: Defined requirements based on user input by interacting with domain experts.
  • Conceptual Schema: Translates requirements into a conceptual overview focusing on data interactions.
  • Logical Design: The phase where mapping occurs between conceptual schemas and implementation structures.
  • Physical Design: Dictates storage organization and access mechanisms.

1.6 Database Engine

  • Database systems comprise modules that handle various responsibilities, including storage, query processing, and transaction management.

- Storage Manager: Manages data storage, interacts with the file manager for data organization, ensures integrity and authorization of access.

Components of Storage Manager:
  • File Manager: Allocates disk space and manages structures.
  • Authorization Manager: Ensures access rights for users based on defined permissions.
  • Buffer Manager: Handles fetching data between disk and memory efficiently.
Query Processor Components:
  • DDL Interpreter: Interprets DDL statements for data dictionaries.
  • DML Compiler: Converts DML queries into low-level instructions, performing optimization to assess execution plans.
  • Query Evaluation Engine: Executes the instructions following the evaluated plan.
Transaction Management:
  • A transaction represents operations that maintain database integrity, supporting aspects like atomicity, consistency, and durability.

1.7 Database and Application Architecture

  • Database System Architecture: User interaction occurs in a centralized server framework with connections among users, storage engines, and processing components.
  • Two-Tier vs Three-Tier Architecture: Moving towards a three-tier architecture for modern applications promotes security and performance versus the simplicity of a two-tier system.

1.8 Database Users and Administrators

  • Individuals interacting with databases can be categorized into users based on their interaction level and database administrators who hold authority over the system.
Database User Types:
  • Naïve Users: Unsophisticated users who engage through predefined interfaces.
  • Application Programmers: Develop applications for user interaction.
  • Sophisticated Users: Analyze data without programming, using querying tools.
Database Administrators (DBA) Responsibilities:
  • Schema Definition: Create original schemas using DDL statements.
  • Storage and Access Methods: Define storage organization and access parameters.
  • Authorization Management: Control users' access levels and enforce data security.
  • Routine Maintenance: Backups, performance monitoring, etc.

1.9 History of Database Systems

  • Evolution of data processing from manual methods to modern database systems spanning multiple decades.
Historical Milestones:
  • 1950s-1960s: Transition from magnetic tapes to hard disks facilitating more direct data access.

  • 1970s-1980s: Introduction and rise of relational models demonstrated by projects like System R and subsequent commercial products such as IBM DB2 and Oracle.

  • 1990s: Emergence of decision support applications, increasing emphasis on web interfaces, and parallel database products.

  • 2000s: Growth of semi-structured data handling (XML/JSON support), open-source systems (PostgreSQL, MySQL), evolving requirements for performance and scalability.

  • 2010s: Progress towards improved consistency and abstraction in NoSQL systems leading to cloud storage solutions for enterprises.

  • The influential role of data analytics and consideration of privacy dimensions over information management has heightened the focus on data security and government regulation of data management practices.