Chapter 1: Databases and Database Users
What is a Database?
A database is a collection of related data; a set of known facts that can be recorded and that have implicit meaning; a miniworld or Universe of Discourse (UoD); represents some aspect of the real world; logically coherent collection of data with inherent meaning (semantics); built for a specific purpose.
Examples span traditional databases, multimedia databases (images, audio, video), GIS (maps, weather, satellite imagery), eCommerce (eBay, Amazon), data warehouses/OLAP for business analysis, and real-time/active database technology for control of industrial processes.
Modern DBMSs can manage diverse data types and support analytical, transactional, and real-time workloads.
DBMS: Definition, Features, and Goals
A DBMS is a suite of programs that allows a user to create and maintain a database.
You can specify: data types, relationships between data elements, and constraints on stored data.
Metadata describes the data and is stored in the database (also called a data dictionary).
A schema is the structure of the database, a particular type of metadata.
DBMSs allow manipulation of both schema and data: query to retrieve information, update operations (insert, modify, delete).
DBMSs are generally multi-user and provide security: encryption of sensitive data and access controls for authorized users.
DBMSs provide a programming interface (API) to interact with the database, enabling application development.
DBMS: Core Terminology
Relational vs Conceptual vs OODB terminology:
Database
Relation
Entity Type (Conceptual/ER model)
Class (OODB terminology)
Table (implementation of a relation)
Field/Attribute/Instance variable/Column
Tuple/Record/Entity Instance/Object/Row
These mappings help connect abstract data models to concrete database implementations.
Implementation Terminology
Table: A set of data records of the same format, with columns (same data type) and rows of related records.
View: A subset (or a composite) of columns from one or more related tables.
Stored Procedure: A parameterized set of SQL statements that can be executed as a unit (a macro).
DB Primitive Data Types
Each column (field/attribute) has a specific data type.
Common types include:
Integers
Fixed-length strings
Variable-length strings
Floating-point numbers
Decimals
Dates, times
BLOBs (binary large objects)
1.2 An Example: UNIVERSITY Database
Purpose: model information concerning students, courses, and grades in a university environment.
Data records (tables):
STUDENT(Name, Student_number, Class, Major)
COURSE(Coursename, Coursenumber, Credit_hours, Department)
SECTION(Sectionidentifier, Coursenumber, Semester, Year, Instructor)
GRADEREPORT(Studentnumber, Section_identifier, Grade)
PREREQUISITE(Coursenumber, Prerequisitenumber)
Design goals: store data to represent each entity and its relationships; enable querying and updating.
Key design question: "What will you want to know from it?" Examples:
Retrieve the transcript
List names of students who took the section of the ‘Database’ course offered in fall 2013 and their grades in that section
List the prerequisites of the ‘Database’ course
Examples of updates:
Change the class of ‘Smith’ to sophomore
Create a new section for the ‘Database’ course for this semester
Enter a grade of ‘A’ for ‘Smith’ in the ‘Database’ section of last semester
A textual depiction of the University database structure (relationships among tables):
STUDENT(Name, Student_number, Class, Major)
COURSE(Coursename, Coursenumber, Credit_hours, Department)
SECTION(Sectionidentifier, Coursenumber, Semester, Year, Instructor)
GRADEREPORT(Studentnumber, Section_identifier, Grade)
PREREQUISITE(Coursenumber, Prerequisitenumber)
Sample data (illustrative):
STUDENT
Name: Smith, Student_number: , Class: , Major: CS
Name: Brown, Student_number: , Class: , Major: CS
COURSE
Coursename: Intro to Computer Science, Coursenumber: CS1310, Credit_hours: , Department: CS
Coursename: Data Structures, Coursenumber: CS3320, Credit_hours: , Department: CS
Coursename: Discrete Mathematics, Coursenumber: MATH2410, Credit_hours: , Department: MATH
Coursename: Database, Coursenumber: CS3380, Credit_hours: , Department: CS
GRADE_REPORT
Each row relates a Studentnumber, a Sectionidentifier, and a Grade (e.g., B, C, A)
SECTION
Sectionidentifier, Coursenumber, Semester, Year, Instructor
PREREQUISITE
Prerequisite relationships among courses (e.g., prerequisites for CS3380 or CS3320)
Notes on data types and relationships: these tables illustrate how a university domain is modeled with relations and how foreign key-like connections link students to sections and grades to courses.
Phases for Designing a Database
Requirements specification and analysis (software-engineering-like step)
Conceptual design
ER Model (Chapter 3) / EER Model (Chapter 4)
Logical design
Relational Model (Chapter 5)
Physical design
1.3 Characteristics of the Database-Approach
The Traditional File Approach
Each user defines and implements files for a software application; file layouts typically embedded in code (not centralized).
Common files may be used across applications, but their structure is not defined in a DBMS.
The Database Approach
A single repository maintains data defined once and accessed by multiple users; the database contains the data layout and structure.
Self-describing nature of the database system; data abstraction and insulation between programs and data; support for multiple views; sharing and multiuser transaction processing.
Self-Describing Database and Metadata
The database system contains a complete definition of structure and constraints.
Metadata, or the Database Catalog, describes the structure of the database and is used by both DBMS software and database users who need information about the database structure.
Example: A catalog that lists relations (tables) and their number of columns, plus per-column data types and which relation each column belongs to.
Example catalog (illustrative):
RELATIONS: STUDE NT (4), COURSE (4), SECTION (5), GRADE_REPORT (3), PREREQUISITE (2)
COLUMNS: Name (STUDENT) Datatype Character(30); Studentnumber (STUDENT) Datatype Character(4); Class (STUDENT) Datatype Integer(1); Major (STUDENT) Datatype Majortype; Coursename (COURSE) Datatype Character(10); Coursenumber (COURSE) Datatype XXXXNNNN; Prerequisitenumber (PREREQUISITE) Datatype XXXXNNNN; Major_type is an enumerated type with all known majors; XXXXNNNN defines a type with four alphabetic characters followed by four digits.
Insulation Between Programs and Data; Program-Data Independence
Program-data independence: The structure of data files is stored in the DBMS catalog separately from access programs.
Program-operation independence: Operations are defined by interface (operation name and data types of arguments); the implementation can change without affecting the interface.
Support of Multiple Views of the Data
View: A subset of the database; a virtual data subset derived from the database files, not explicitly stored.
Multiuser DBMS must provide facilities for defining multiple views to accommodate different user needs and applications.
Sharing of Data and Multiuser Transaction Processing
Multiple users can access the database concurrently; concurrency control software ensures updates are performed in a controlled, correct manner.
Online Transaction Processing (OLTP) applications involve form-filling and other interactive transactions.
Transaction: An executing program/process that includes one or more database accesses (read, update, etc.).
Two key properties:
Isolation: Each transaction appears to execute in isolation from other transactions.
Atomicity: Either all operations in a transaction are executed or none are.
1.4 Actors on the Scene
Database Administrators (DBA):
Authorize access, coordinate and monitor use, acquire resources.
Database Designers:
Identify data to store, choose appropriate structures to represent and store data.
End Users:
Types: Casual end users, Naive or parametric end users, Sophisticated end users, Standalone users.
1.5 Workers Behind the Scenes
DBMS system designers and implementers: Design and implement DBMS modules and interfaces; ensure compatibility with OS and compilers.
Tool developers: Design and implement tools for database development and maintenance.
Operators and maintenance personnel: Run and maintain the hardware/software environment for the database system.
1.6 Advantages of Using the DBMS Approach
Controlling redundancy via data normalization; denormalization may be used strategically to improve query performance.
Restricting unauthorized access; security and authorization subsystems; privileged software controls.
Providing persistent storage for program objects; e.g., stored in an object-oriented DBMS to address impedance mismatch with programming languages.
Providing backup and recovery facilities through a DBMS subsystem.
Providing multiple user interfaces (e.g., GUI) for interacting with the data; supporting complex interrelated data.
Enforcing integrity constraints: Referential integrity (e.g., a section must relate to a course), Key/Uniqueness constraints (e.g., Course_number must be unique), and business rules.
Supporting inferencing and actions via rules (e.g., Deductive databases); triggers; stored procedures for enforcing rules.
Other implications: Reduced application development time; flexibility; up-to-date information; economies of scale.
1.7 A Brief History of Database Applications
Evolution:
Hierarchical and Network Systems: intertwined conceptual relationships with physical storage.
Relational Databases: data abstraction and application flexibility; separation of physical storage from conceptualization.
Object-Oriented Applications: evolving data interchange via web and XML for e-commerce.
Extending capabilities: scientific applications; storage/retrieval of images and videos; data mining; spatial/time-series applications; databases vs. information retrieval.
1.8 When Not to Use a DBMS
In some cases, regular files are preferable:
Simple, well-defined database applications unlikely to change.
Stringent, real-time requirements that DBMS overhead cannot meet.
Embedded systems with limited storage capacity.
No need for concurrent/multiuser access to data.
Connections to Course Context and Practical Relevance
The DBMS model separates data definitions (schema) from data usage (applications), enabling multiple views and multiuser access.
The self-describing catalog supports metadata-driven design and tooling, aligning with modern database development practices.
Understanding phases of design (requirements, conceptual/logical/physical) is essential for translating real-world needs into robust database schemas.
Real-world scenarios (transcripts, prerequisites, course enrollment) illustrate how entities relate and how queries are formed to retrieve meaningful information.
Quick Reference: Key Concepts to Memorize
Database vs DBMS: data repository vs the software that manages it
Self-describing: metadata/catalog stores structure and constraints
Insulation: programs vs data independence; interface vs implementation
Views: derived, non-stored representations for different users
ACID properties: Atomicity, Consistency, Isolation, Durability (transaction integrity guarantees)
Roles: DBA, database designer, end users, system analysts, application programmers
Advantages: data sharing, integrity, security, backup/recovery, flexibility
When not to use: simplicity, real-time constraints, embedded systems, single-user access