N

Chapter 1: Databases and Database Users

What is a Database?

  • A database is a collection of related data; a set of known facts that can be recorded and that have implicit meaning; a miniworld or Universe of Discourse (UoD); represents some aspect of the real world; logically coherent collection of data with inherent meaning (semantics); built for a specific purpose.
  • Examples span traditional databases, multimedia databases (images, audio, video), GIS (maps, weather, satellite imagery), eCommerce (eBay, Amazon), data warehouses/OLAP for business analysis, and real-time/active database technology for control of industrial processes.
  • Modern DBMSs can manage diverse data types and support analytical, transactional, and real-time workloads.

DBMS: Definition, Features, and Goals

  • A DBMS is a suite of programs that allows a user to create and maintain a database.
    • You can specify: data types, relationships between data elements, and constraints on stored data.
    • Metadata describes the data and is stored in the database (also called a data dictionary).
    • A schema is the structure of the database, a particular type of metadata.
  • DBMSs allow manipulation of both schema and data: query to retrieve information, update operations (insert, modify, delete).
  • DBMSs are generally multi-user and provide security: encryption of sensitive data and access controls for authorized users.
  • DBMSs provide a programming interface (API) to interact with the database, enabling application development.

DBMS: Core Terminology

  • Relational vs Conceptual vs OODB terminology:
    • Database
    • Relation
    • Entity Type (Conceptual/ER model)
    • Class (OODB terminology)
    • Table (implementation of a relation)
    • Field/Attribute/Instance variable/Column
    • Tuple/Record/Entity Instance/Object/Row
  • These mappings help connect abstract data models to concrete database implementations.

Implementation Terminology

  • Table: A set of data records of the same format, with columns (same data type) and rows of related records.
  • View: A subset (or a composite) of columns from one or more related tables.
  • Stored Procedure: A parameterized set of SQL statements that can be executed as a unit (a macro).

DB Primitive Data Types

  • Each column (field/attribute) has a specific data type.
  • Common types include:
    • Integers
    • Fixed-length strings
    • Variable-length strings
    • Floating-point numbers
    • Decimals
    • Dates, times
    • BLOBs (binary large objects)

1.2 An Example: UNIVERSITY Database

  • Purpose: model information concerning students, courses, and grades in a university environment.
  • Data records (tables):
    • STUDENT(Name, Student_number, Class, Major)
    • COURSE(Coursename, Coursenumber, Credit_hours, Department)
    • SECTION(Sectionidentifier, Coursenumber, Semester, Year, Instructor)
    • GRADEREPORT(Studentnumber, Section_identifier, Grade)
    • PREREQUISITE(Coursenumber, Prerequisitenumber)
  • Design goals: store data to represent each entity and its relationships; enable querying and updating.
  • Key design question: "What will you want to know from it?" Examples:
    • Retrieve the transcript
    • List names of students who took the section of the ‘Database’ course offered in fall 2013 and their grades in that section
    • List the prerequisites of the ‘Database’ course
  • Examples of updates:
    • Change the class of ‘Smith’ to sophomore
    • Create a new section for the ‘Database’ course for this semester
    • Enter a grade of ‘A’ for ‘Smith’ in the ‘Database’ section of last semester
  • A textual depiction of the University database structure (relationships among tables):
    • STUDENT(Name, Student_number, Class, Major)
    • COURSE(Coursename, Coursenumber, Credit_hours, Department)
    • SECTION(Sectionidentifier, Coursenumber, Semester, Year, Instructor)
    • GRADEREPORT(Studentnumber, Section_identifier, Grade)
    • PREREQUISITE(Coursenumber, Prerequisitenumber)
  • Sample data (illustrative):
    • STUDENT
    • Name: Smith, Student_number: 17, Class: 1, Major: CS
    • Name: Brown, Student_number: 8, Class: 2, Major: CS
    • COURSE
    • Coursename: Intro to Computer Science, Coursenumber: CS1310, Credit_hours: 4, Department: CS
    • Coursename: Data Structures, Coursenumber: CS3320, Credit_hours: 4, Department: CS
    • Coursename: Discrete Mathematics, Coursenumber: MATH2410, Credit_hours: 3, Department: MATH
    • Coursename: Database, Coursenumber: CS3380, Credit_hours: 3, Department: CS
    • GRADE_REPORT
    • Each row relates a Studentnumber, a Sectionidentifier, and a Grade (e.g., B, C, A)
    • SECTION
    • Sectionidentifier, Coursenumber, Semester, Year, Instructor
    • PREREQUISITE
    • Prerequisite relationships among courses (e.g., prerequisites for CS3380 or CS3320)
  • Notes on data types and relationships: these tables illustrate how a university domain is modeled with relations and how foreign key-like connections link students to sections and grades to courses.

Phases for Designing a Database

  • Requirements specification and analysis (software-engineering-like step)
  • Conceptual design
    • ER Model (Chapter 3) / EER Model (Chapter 4)
  • Logical design
    • Relational Model (Chapter 5)
  • Physical design

1.3 Characteristics of the Database-Approach

  • The Traditional File Approach
    • Each user defines and implements files for a software application; file layouts typically embedded in code (not centralized).
    • Common files may be used across applications, but their structure is not defined in a DBMS.
  • The Database Approach
    • A single repository maintains data defined once and accessed by multiple users; the database contains the data layout and structure.
    • Self-describing nature of the database system; data abstraction and insulation between programs and data; support for multiple views; sharing and multiuser transaction processing.

Self-Describing Database and Metadata

  • The database system contains a complete definition of structure and constraints.
  • Metadata, or the Database Catalog, describes the structure of the database and is used by both DBMS software and database users who need information about the database structure.
  • Example: A catalog that lists relations (tables) and their number of columns, plus per-column data types and which relation each column belongs to.
  • Example catalog (illustrative):
    • RELATIONS: STUDE NT (4), COURSE (4), SECTION (5), GRADE_REPORT (3), PREREQUISITE (2)
    • COLUMNS: Name (STUDENT) Datatype Character(30); Studentnumber (STUDENT) Datatype Character(4); Class (STUDENT) Datatype Integer(1); Major (STUDENT) Datatype Majortype; Coursename (COURSE) Datatype Character(10); Coursenumber (COURSE) Datatype XXXXNNNN; Prerequisitenumber (PREREQUISITE) Datatype XXXXNNNN; Major_type is an enumerated type with all known majors; XXXXNNNN defines a type with four alphabetic characters followed by four digits.

Insulation Between Programs and Data; Program-Data Independence

  • Program-data independence: The structure of data files is stored in the DBMS catalog separately from access programs.
  • Program-operation independence: Operations are defined by interface (operation name and data types of arguments); the implementation can change without affecting the interface.

Support of Multiple Views of the Data

  • View: A subset of the database; a virtual data subset derived from the database files, not explicitly stored.
  • Multiuser DBMS must provide facilities for defining multiple views to accommodate different user needs and applications.

Sharing of Data and Multiuser Transaction Processing

  • Multiple users can access the database concurrently; concurrency control software ensures updates are performed in a controlled, correct manner.
  • Online Transaction Processing (OLTP) applications involve form-filling and other interactive transactions.
  • Transaction: An executing program/process that includes one or more database accesses (read, update, etc.).
  • Two key properties:
    • Isolation: Each transaction appears to execute in isolation from other transactions.
    • Atomicity: Either all operations in a transaction are executed or none are.

1.4 Actors on the Scene

  • Database Administrators (DBA):
    • Authorize access, coordinate and monitor use, acquire resources.
  • Database Designers:
    • Identify data to store, choose appropriate structures to represent and store data.
  • End Users:
    • Types: Casual end users, Naive or parametric end users, Sophisticated end users, Standalone users.

1.5 Workers Behind the Scenes

  • DBMS system designers and implementers: Design and implement DBMS modules and interfaces; ensure compatibility with OS and compilers.
  • Tool developers: Design and implement tools for database development and maintenance.
  • Operators and maintenance personnel: Run and maintain the hardware/software environment for the database system.

1.6 Advantages of Using the DBMS Approach

  • Controlling redundancy via data normalization; denormalization may be used strategically to improve query performance.
  • Restricting unauthorized access; security and authorization subsystems; privileged software controls.
  • Providing persistent storage for program objects; e.g., stored in an object-oriented DBMS to address impedance mismatch with programming languages.
  • Providing backup and recovery facilities through a DBMS subsystem.
  • Providing multiple user interfaces (e.g., GUI) for interacting with the data; supporting complex interrelated data.
  • Enforcing integrity constraints: Referential integrity (e.g., a section must relate to a course), Key/Uniqueness constraints (e.g., Course_number must be unique), and business rules.
  • Supporting inferencing and actions via rules (e.g., Deductive databases); triggers; stored procedures for enforcing rules.
  • Other implications: Reduced application development time; flexibility; up-to-date information; economies of scale.

1.7 A Brief History of Database Applications

  • Evolution:
    • Hierarchical and Network Systems: intertwined conceptual relationships with physical storage.
    • Relational Databases: data abstraction and application flexibility; separation of physical storage from conceptualization.
    • Object-Oriented Applications: evolving data interchange via web and XML for e-commerce.
  • Extending capabilities: scientific applications; storage/retrieval of images and videos; data mining; spatial/time-series applications; databases vs. information retrieval.

1.8 When Not to Use a DBMS

  • In some cases, regular files are preferable:
    • Simple, well-defined database applications unlikely to change.
    • Stringent, real-time requirements that DBMS overhead cannot meet.
    • Embedded systems with limited storage capacity.
    • No need for concurrent/multiuser access to data.

Connections to Course Context and Practical Relevance

  • The DBMS model separates data definitions (schema) from data usage (applications), enabling multiple views and multiuser access.
  • The self-describing catalog supports metadata-driven design and tooling, aligning with modern database development practices.
  • Understanding phases of design (requirements, conceptual/logical/physical) is essential for translating real-world needs into robust database schemas.
  • Real-world scenarios (transcripts, prerequisites, course enrollment) illustrate how entities relate and how queries are formed to retrieve meaningful information.

Quick Reference: Key Concepts to Memorize

  • Database vs DBMS: data repository vs the software that manages it
  • Self-describing: metadata/catalog stores structure and constraints
  • Insulation: programs vs data independence; interface vs implementation
  • Views: derived, non-stored representations for different users
  • ACID properties: Atomicity, Consistency, Isolation, Durability (transaction integrity guarantees)
  • Roles: DBA, database designer, end users, system analysts, application programmers
  • Advantages: data sharing, integrity, security, backup/recovery, flexibility
  • When not to use: simplicity, real-time constraints, embedded systems, single-user access