Database Management Systems: Chapter 1 Study Notes
CHAPTER 1: Introduction
A database-management system (DBMS) is a collection of interrelated data and a set of programs to access those data.
- The collection of data, referred to as the database, contains information relevant to an enterprise.
- Primary goal of a DBMS: To provide a way to store and retrieve database information that is both convenient and efficient.
Management of Data:
- Involves defining structures for storage of information and providing mechanisms for manipulation of information.
- Must ensure safety of information stored despite system crashes or unauthorized access.
- If data is shared among several users, the system must avoid anomalous results.
The importance of information in organizations has led to a large body of concepts and techniques for managing data. These concepts and techniques are the focus of this book.
1.1 Database-System Applications
The earliest database systems emerged in the 1960s to manage commercial data. Earlier applications were relatively simple compared to modern database applications.
Modern applications: Include complex, worldwide enterprises. The central aspect of any application is the data themselves, not the program performing calculations.
Value of corporations today: Often comes from the information they own, rather than physical assets, e.g., a bank without customer data loses nearly all its value.
Characteristics of Database Systems:
Manage collections of data that:
- Are highly valuable,
- Are relatively large,
- Are accessed by multiple users and applications, often simultaneously.
Types of Data in Database Applications:
- Structured Data: Data with a standard repeating structure, e.g., university records for students and courses.
- Example:
- Course records contain course-identifier, title, department, course number.
- Student records contain student-identifier, name, address, phone.
- Course registration is a collection of pairs: one course identifier and one student identifier.
- Complex Data: Modern applications like social networks handle highly variable data including text, images, videos, etc.
- Both structured and weakly structured data support basic features of a database.
Managing Complexity
- Abstraction: Enables users to operate complex systems without needing to understand intricate details.
- Example: A person can drive a car by knowing how to use its controls, not how the engine functions.
- Database systems abstract data complexity to provide unified repositories of information necessary for enterprise operations.
Representative Applications of Database Systems:
Enterprise Information:
- Sales, Accounting, Human Resources.
Manufacturing:
- Management of supply chain, production tracking, inventory management.
Banking and Finance:
- Customer information, accounts, transactions.
Universities:
- Student data, course registrations, grades.
Airlines and Telecommunication:
- Reservations, communication records, billing.
Web Services:
- Social media, online retailers, advertisements.
Document Databases: Articles, patents, research papers.
Navigation Systems:
- Location tracking and route management.
The interaction between users and databases has changed. Early databases were back-office systems, evolving to support user interfaces for queries and data updates.
- Modern Interaction: Web applications and mobile applications allow direct customer interaction with databases.
Database Modes of Use:
- Online Transaction Processing: Supports multiple users accessing small data amounts and performing small updates.
- Data Analytics: Involves processing data to infer conclusions or rules used to influence business decisions.
1.2 Purpose of Database Systems
A university organization keeps data about students, instructors, departments, course offerings.
Storing information in operating system files introduces issues:
- Data Redundancy and Inconsistency: Different programmers may create files leading to redundancy and inconsistency due to updates not reflected across files.
- Difficulty in Access: Lack of adequate programs to meet data retrieval queries can lead to inefficiency.
- Data Isolation: Scattered storage in assorted files makes retrieving data challenging.
- Integrity Problems: Ensuring consistency constraints is complex and error-prone when using multiple application programs.
- Atomicity Problems: A system failure during a multi-step program (e.g., fund transfers) can lead to inconsistent database states.
- Concurrent-Access Anomalies: Simultaneous updates can produce inconsistent data if not properly managed.
- Security Problems: Ensuring the right users access appropriate data adds to the complexity of maintaining a file-processing system.
The above problems contributed to the development of database systems during the 1960s and 1970s.
The chapter will present concepts enabling database systems to overcome file-processing system limitations.
1.3 View of Data
- A database system comprises interrelated data and programs for data access and modification, providing an abstract view of the data.
1.3.1 Data Models:
- The data model forms the basis of the database. It comprises conceptual tools for describing data, relationships, semantics, and constraints.
- Categories of Data Models:
- Relational Model: Uses tables to represent data and their relationships. Each table consists of records corresponding to attributes. It is a record-based model widely employed in current database systems.
- Entity-Relationship Model (E-R): Consists of entities and relationships used commonly in database design.
- Semi-Structured Data Model: Allows variations in attributes for the same data type; exemplified by JSON and XML.
- Object-Based Data Model: Integrates object-oriented programming concepts into relational databases, supporting methods and encapsulation.
1.3.2 Relational Data Model:
- Data is represented through tables with multiple columns, each having unique names.
- A relational database includes several tables. Example: Tables for instructors and departments demonstrate relationships through shared attributes.
1.3.3 Data Abstraction:
- To streamline user interaction, database systems implement several levels of data abstraction:
- Physical Level: Lowest abstraction level describing actual storage.
- Logical Level: Details what data are stored and relationships among them.
- View Level: Highest level focusing on parts of the database needed by users, hiding complex details.
1.3.4 Instances and Schemas:
- An instance is the data in a database at any given moment, while the schema is the overall database design.
- Understanding schemas vs instances is analogous to programming variable declarations and current values.
- Database systems may explore multiple schemas, each revealing different abstraction levels.
1.4 Database Languages
- Database systems provide a data-definition language (DDL) for schema specification and a data-manipulation language (DML) for queries and updates.
1.4.1 Data-Definition Language (DDL):
The DDL specifies schemas and additional data properties. It permits defining storage structures and constraints that must be satisfied during data updates.
**Types of Constraints:
- Domain Constraints: Define permissible values for attributes.
- Referential Integrity: Ensures values in one relation are valid in another.
- Authorization: Specifies access permissions for users.
Output of DDL is stored in the data dictionary. It holds metadata about the database.
1.4.2 SQL Data-Definition Language:
- SQL provides a rich DDL to define tables and constraints. Example statement for creating department table:
create table department (dept name char(20), building char(15), budget numeric(12,2));
1.4.3 Data-Manipulation Language (DML):
- DML enables data access or manipulation, facilitating four primary functions: retrieval, insertion, deletion, and modification of data.
- Types of DML:
- Procedural DMLs: Users specify what data are needed and how to access them.
- Declarative DMLs: Users only state what data they need without outlining how to retrieve them.
1.4.4 SQL Data-Manipulation Language:
- SQL is nonprocedural, returning results in table form. Example SQL query to retrieve names of History department instructors:
select instructor.name from instructor where instructor.dept name = 'History';
1.4.5 Database Access from Application Programs:
- Application programs serve as intermediaries to execute DML statements and provide user interaction.
- Access to database involves communication through APIs such as ODBC for C and JDBC for Java.
1.5 Database Design
- Database systems are designed to manage vast amounts of data essential for enterprise operations.
- Database Schema Design: Primarily focuses on the feedback from users defining the data requirements and relationships needed for efficient data handling.
Phases of Database Design:
- Specification: Defined requirements based on user input by interacting with domain experts.
- Conceptual Schema: Translates requirements into a conceptual overview focusing on data interactions.
- Logical Design: The phase where mapping occurs between conceptual schemas and implementation structures.
- Physical Design: Dictates storage organization and access mechanisms.
1.6 Database Engine
- Database systems comprise modules that handle various responsibilities, including storage, query processing, and transaction management.
- Storage Manager: Manages data storage, interacts with the file manager for data organization, ensures integrity and authorization of access.
Components of Storage Manager:
- File Manager: Allocates disk space and manages structures.
- Authorization Manager: Ensures access rights for users based on defined permissions.
- Buffer Manager: Handles fetching data between disk and memory efficiently.
Query Processor Components:
- DDL Interpreter: Interprets DDL statements for data dictionaries.
- DML Compiler: Converts DML queries into low-level instructions, performing optimization to assess execution plans.
- Query Evaluation Engine: Executes the instructions following the evaluated plan.
Transaction Management:
- A transaction represents operations that maintain database integrity, supporting aspects like atomicity, consistency, and durability.
1.7 Database and Application Architecture
- Database System Architecture: User interaction occurs in a centralized server framework with connections among users, storage engines, and processing components.
- Two-Tier vs Three-Tier Architecture: Moving towards a three-tier architecture for modern applications promotes security and performance versus the simplicity of a two-tier system.
1.8 Database Users and Administrators
- Individuals interacting with databases can be categorized into users based on their interaction level and database administrators who hold authority over the system.
Database User Types:
- Naïve Users: Unsophisticated users who engage through predefined interfaces.
- Application Programmers: Develop applications for user interaction.
- Sophisticated Users: Analyze data without programming, using querying tools.
Database Administrators (DBA) Responsibilities:
- Schema Definition: Create original schemas using DDL statements.
- Storage and Access Methods: Define storage organization and access parameters.
- Authorization Management: Control users' access levels and enforce data security.
- Routine Maintenance: Backups, performance monitoring, etc.
1.9 History of Database Systems
- Evolution of data processing from manual methods to modern database systems spanning multiple decades.
Historical Milestones:
1950s-1960s: Transition from magnetic tapes to hard disks facilitating more direct data access.
1970s-1980s: Introduction and rise of relational models demonstrated by projects like System R and subsequent commercial products such as IBM DB2 and Oracle.
1990s: Emergence of decision support applications, increasing emphasis on web interfaces, and parallel database products.
2000s: Growth of semi-structured data handling (XML/JSON support), open-source systems (PostgreSQL, MySQL), evolving requirements for performance and scalability.
2010s: Progress towards improved consistency and abstraction in NoSQL systems leading to cloud storage solutions for enterprises.
The influential role of data analytics and consideration of privacy dimensions over information management has heightened the focus on data security and government regulation of data management practices.