Chapter 1: Databases and Database Users
What is a Database?
- A database is a collection of related data; a set of known facts that can be recorded and that have implicit meaning; a miniworld or Universe of Discourse (UoD); represents some aspect of the real world; logically coherent collection of data with inherent meaning (semantics); built for a specific purpose.
- Examples span traditional databases, multimedia databases (images, audio, video), GIS (maps, weather, satellite imagery), eCommerce (eBay, Amazon), data warehouses/OLAP for business analysis, and real-time/active database technology for control of industrial processes.
- Modern DBMSs can manage diverse data types and support analytical, transactional, and real-time workloads.
DBMS: Definition, Features, and Goals
- A DBMS is a suite of programs that allows a user to create and maintain a database.
- You can specify: data types, relationships between data elements, and constraints on stored data.
- Metadata describes the data and is stored in the database (also called a data dictionary).
- A schema is the structure of the database, a particular type of metadata.
- DBMSs allow manipulation of both schema and data: query to retrieve information, update operations (insert, modify, delete).
- DBMSs are generally multi-user and provide security: encryption of sensitive data and access controls for authorized users.
- DBMSs provide a programming interface (API) to interact with the database, enabling application development.
DBMS: Core Terminology
- Relational vs Conceptual vs OODB terminology:
- Database
- Relation
- Entity Type (Conceptual/ER model)
- Class (OODB terminology)
- Table (implementation of a relation)
- Field/Attribute/Instance variable/Column
- Tuple/Record/Entity Instance/Object/Row
- These mappings help connect abstract data models to concrete database implementations.
Implementation Terminology
- Table: A set of data records of the same format, with columns (same data type) and rows of related records.
- View: A subset (or a composite) of columns from one or more related tables.
- Stored Procedure: A parameterized set of SQL statements that can be executed as a unit (a macro).
DB Primitive Data Types
- Each column (field/attribute) has a specific data type.
- Common types include:
- Integers
- Fixed-length strings
- Variable-length strings
- Floating-point numbers
- Decimals
- Dates, times
- BLOBs (binary large objects)
1.2 An Example: UNIVERSITY Database
- Purpose: model information concerning students, courses, and grades in a university environment.
- Data records (tables):
- STUDENT(Name, Student_number, Class, Major)
- COURSE(Coursename, Coursenumber, Credit_hours, Department)
- SECTION(Sectionidentifier, Coursenumber, Semester, Year, Instructor)
- GRADEREPORT(Studentnumber, Section_identifier, Grade)
- PREREQUISITE(Coursenumber, Prerequisitenumber)
- Design goals: store data to represent each entity and its relationships; enable querying and updating.
- Key design question: "What will you want to know from it?" Examples:
- Retrieve the transcript
- List names of students who took the section of the ‘Database’ course offered in fall 2013 and their grades in that section
- List the prerequisites of the ‘Database’ course
- Examples of updates:
- Change the class of ‘Smith’ to sophomore
- Create a new section for the ‘Database’ course for this semester
- Enter a grade of ‘A’ for ‘Smith’ in the ‘Database’ section of last semester
- A textual depiction of the University database structure (relationships among tables):
- STUDENT(Name, Student_number, Class, Major)
- COURSE(Coursename, Coursenumber, Credit_hours, Department)
- SECTION(Sectionidentifier, Coursenumber, Semester, Year, Instructor)
- GRADEREPORT(Studentnumber, Section_identifier, Grade)
- PREREQUISITE(Coursenumber, Prerequisitenumber)
- Sample data (illustrative):
- STUDENT
- Name: Smith, Student_number: 17, Class: 1, Major: CS
- Name: Brown, Student_number: 8, Class: 2, Major: CS
- COURSE
- Coursename: Intro to Computer Science, Coursenumber: CS1310, Credit_hours: 4, Department: CS
- Coursename: Data Structures, Coursenumber: CS3320, Credit_hours: 4, Department: CS
- Coursename: Discrete Mathematics, Coursenumber: MATH2410, Credit_hours: 3, Department: MATH
- Coursename: Database, Coursenumber: CS3380, Credit_hours: 3, Department: CS
- GRADE_REPORT
- Each row relates a Studentnumber, a Sectionidentifier, and a Grade (e.g., B, C, A)
- SECTION
- Sectionidentifier, Coursenumber, Semester, Year, Instructor
- PREREQUISITE
- Prerequisite relationships among courses (e.g., prerequisites for CS3380 or CS3320)
- Notes on data types and relationships: these tables illustrate how a university domain is modeled with relations and how foreign key-like connections link students to sections and grades to courses.
Phases for Designing a Database
- Requirements specification and analysis (software-engineering-like step)
- Conceptual design
- ER Model (Chapter 3) / EER Model (Chapter 4)
- Logical design
- Relational Model (Chapter 5)
- Physical design
1.3 Characteristics of the Database-Approach
- The Traditional File Approach
- Each user defines and implements files for a software application; file layouts typically embedded in code (not centralized).
- Common files may be used across applications, but their structure is not defined in a DBMS.
- The Database Approach
- A single repository maintains data defined once and accessed by multiple users; the database contains the data layout and structure.
- Self-describing nature of the database system; data abstraction and insulation between programs and data; support for multiple views; sharing and multiuser transaction processing.
- The database system contains a complete definition of structure and constraints.
- Metadata, or the Database Catalog, describes the structure of the database and is used by both DBMS software and database users who need information about the database structure.
- Example: A catalog that lists relations (tables) and their number of columns, plus per-column data types and which relation each column belongs to.
- Example catalog (illustrative):
- RELATIONS: STUDE NT (4), COURSE (4), SECTION (5), GRADE_REPORT (3), PREREQUISITE (2)
- COLUMNS: Name (STUDENT) Datatype Character(30); Studentnumber (STUDENT) Datatype Character(4); Class (STUDENT) Datatype Integer(1); Major (STUDENT) Datatype Majortype; Coursename (COURSE) Datatype Character(10); Coursenumber (COURSE) Datatype XXXXNNNN; Prerequisitenumber (PREREQUISITE) Datatype XXXXNNNN; Major_type is an enumerated type with all known majors; XXXXNNNN defines a type with four alphabetic characters followed by four digits.
Insulation Between Programs and Data; Program-Data Independence
- Program-data independence: The structure of data files is stored in the DBMS catalog separately from access programs.
- Program-operation independence: Operations are defined by interface (operation name and data types of arguments); the implementation can change without affecting the interface.
Support of Multiple Views of the Data
- View: A subset of the database; a virtual data subset derived from the database files, not explicitly stored.
- Multiuser DBMS must provide facilities for defining multiple views to accommodate different user needs and applications.
Sharing of Data and Multiuser Transaction Processing
- Multiple users can access the database concurrently; concurrency control software ensures updates are performed in a controlled, correct manner.
- Online Transaction Processing (OLTP) applications involve form-filling and other interactive transactions.
- Transaction: An executing program/process that includes one or more database accesses (read, update, etc.).
- Two key properties:
- Isolation: Each transaction appears to execute in isolation from other transactions.
- Atomicity: Either all operations in a transaction are executed or none are.
1.4 Actors on the Scene
- Database Administrators (DBA):
- Authorize access, coordinate and monitor use, acquire resources.
- Database Designers:
- Identify data to store, choose appropriate structures to represent and store data.
- End Users:
- Types: Casual end users, Naive or parametric end users, Sophisticated end users, Standalone users.
1.5 Workers Behind the Scenes
- DBMS system designers and implementers: Design and implement DBMS modules and interfaces; ensure compatibility with OS and compilers.
- Tool developers: Design and implement tools for database development and maintenance.
- Operators and maintenance personnel: Run and maintain the hardware/software environment for the database system.
1.6 Advantages of Using the DBMS Approach
- Controlling redundancy via data normalization; denormalization may be used strategically to improve query performance.
- Restricting unauthorized access; security and authorization subsystems; privileged software controls.
- Providing persistent storage for program objects; e.g., stored in an object-oriented DBMS to address impedance mismatch with programming languages.
- Providing backup and recovery facilities through a DBMS subsystem.
- Providing multiple user interfaces (e.g., GUI) for interacting with the data; supporting complex interrelated data.
- Enforcing integrity constraints: Referential integrity (e.g., a section must relate to a course), Key/Uniqueness constraints (e.g., Course_number must be unique), and business rules.
- Supporting inferencing and actions via rules (e.g., Deductive databases); triggers; stored procedures for enforcing rules.
- Other implications: Reduced application development time; flexibility; up-to-date information; economies of scale.
1.7 A Brief History of Database Applications
- Evolution:
- Hierarchical and Network Systems: intertwined conceptual relationships with physical storage.
- Relational Databases: data abstraction and application flexibility; separation of physical storage from conceptualization.
- Object-Oriented Applications: evolving data interchange via web and XML for e-commerce.
- Extending capabilities: scientific applications; storage/retrieval of images and videos; data mining; spatial/time-series applications; databases vs. information retrieval.
1.8 When Not to Use a DBMS
- In some cases, regular files are preferable:
- Simple, well-defined database applications unlikely to change.
- Stringent, real-time requirements that DBMS overhead cannot meet.
- Embedded systems with limited storage capacity.
- No need for concurrent/multiuser access to data.
Connections to Course Context and Practical Relevance
- The DBMS model separates data definitions (schema) from data usage (applications), enabling multiple views and multiuser access.
- The self-describing catalog supports metadata-driven design and tooling, aligning with modern database development practices.
- Understanding phases of design (requirements, conceptual/logical/physical) is essential for translating real-world needs into robust database schemas.
- Real-world scenarios (transcripts, prerequisites, course enrollment) illustrate how entities relate and how queries are formed to retrieve meaningful information.
Quick Reference: Key Concepts to Memorize
- Database vs DBMS: data repository vs the software that manages it
- Self-describing: metadata/catalog stores structure and constraints
- Insulation: programs vs data independence; interface vs implementation
- Views: derived, non-stored representations for different users
- ACID properties: Atomicity, Consistency, Isolation, Durability (transaction integrity guarantees)
- Roles: DBA, database designer, end users, system analysts, application programmers
- Advantages: data sharing, integrity, security, backup/recovery, flexibility
- When not to use: simplicity, real-time constraints, embedded systems, single-user access