Chapter 2: Database System Concepts and Architecture - VOCABULARY flashcards

2.1 Data Models, Schemas, and Instances

Characteristics of the database approach versus traditional file processing
- Database approach uses a single repository of data accessed by multiple users and applications through queries, transactions, and programs
- Traditional file processing builds separate files and programs for each application, causing data redundancy and maintenance inefficiencies
Traditional File Processing Example
- A grade reporting office maintains files on students and grades; programs to print transcripts and enter new grades are part of the application
- An accounting office maintains files on student fees and payments; separate files and programs exist for these needs
- Consequence: data redundancy leads to wasted storage and duplicated maintenance work to keep data up to date
Database Approach Example
- A single repository stores data defined once and accessed by many users via queries and programs
- Key advantages include shared data, reduced redundancy, and centralized management
Main characteristics of the database approach (vs. file processing)
- Self-describing nature of a database system
- Insulation between programs and data; data abstraction
- Support for multiple views of the data
- Data sharing and multiuser transaction processing
Data Models and Schemas
- Data abstraction: suppression of details about data organization/storage to highlight essential features for understanding
- Different users perceive data at different levels of detail through abstraction
Data Model
- A collection of concepts that describe the structure of a database
- Basic operations: specify retrievals and updates on the database
- Dynamic aspect or behavior: allows the database designer to specify a set of valid operations on database objects
Categories of Data Models
- High-level (conceptual) data models: close to how users perceive data
- Low-level (physical) data models: describe details of how data is stored on storage media
- Representational data models: easily understood by end users; similar to how data is organized on storage; hides many storage details but can be implemented directly
Conceptual Data Models
- Conceptual types include relational data model (most common in traditional commercial DBMSs), object data model (newer family of higher-level implementation models, closer to conceptual models), and semantic data model (e.g., Semantic Web)
Conceptual Features
- Entity: represents a real-world object or concept
- Relationship: an association among two or more entities
- Entity-Relationship (ER) model: models entities and their relationships
- Attribute: property of an entity; provides further description of an entity
Physical Data Models
- Physical: describes how data is stored as files in the computer
- Access path: structure that makes search for particular records more efficient
- Index: example of an access path; allows direct access to data using an index term or keyword
Schemas, Instances, and Database State
- Distinguish between the description of the database (schema) and the database itself (state)
Schema Diagram Example (Figure 2.1 in Figure 1.2)
- Entities/tables and their attributes:
- STUDENT(Name, Student_number, Class, Major)
- COURSE(Coursename, Coursenumber, Credit_hours, Department)
- PREREQUISITE(Coursenumber, Prerequisitenumber)
- SECTION(Sectionidentifier, Coursenumber, Semester, Year, Instructor)
- GRADEREPORT(Studentnumber, Section_identifier, Grade)
Schema changes and evolution
- Schema changes are usually needed as requirements change
- Newer DBMS provide operations to allow schema changes, though the process is more involved than simple data updates
Notational notes about terminology
- It is customary to use schemas as the plural for schema, though schemata is the proper plural form
- The word scheme is sometimes used to refer to a schema
Define key concepts
- Define a new database
- Specify database schema to the DBMS
- Initial state: populated or loaded with initial data
- Valid state: satisfies the structure and constraints specified by the schema
- Schema evolution: changes applied to the schema as application requirements change
Figure 2.1 (Schema diagram) and column/field labeling visuals (as shown in slides)
Practical takeaway
- Understanding schemas vs. instances helps in reasoning about data definitions vs. actual data at a point in time

2.2 Three-Schema Architecture and Data Independence

Three-schema architecture overview
- Internal level: describes physical storage structure of the database
- Conceptual level: describes the structure of the whole database for a community of users
- External (view) level: describes part of the database that a particular user group is interested in
Architecture mapping
- External level -> external/view schemas
- Conceptual level -> conceptual/internal mappings
- Internal level -> internal/storage schema
External, conceptual, and internal levels (visual form)
- External Level: External View(s) tailored to user needs
- Conceptual Level: Conceptual Schema representing the overall database structure for all users
- Internal Level: Internal Schema describing physical storage details
- Mappings: External/Conceptual Mapping and Conceptual/Internal Mapping connect levels
Data independence
- Capacity to change the schema at one level without changing the higher levels
- Types of data independence
- Logical data independence: ability to change the conceptual schema without changing external schemas or applications
- Physical data independence: ability to change the internal/physical schema without changing the conceptual schema
- Practical implication: allows evolving DBMS technology or storage hardware without rewriting applications
Example implications
- Changing DBMS or disk drive can be done with logical/physical independence, preserving application interfaces and external views

2.3 Database Languages and Interfaces

DBMS languages and their roles
- Data Definition Language (DDL): defines data and storage schemas
- Storage Definition Language (SDL): specifies the internal schema (how data is stored on disk)
- View Definition Language (VDL): specifies user views and views-to-conceptual-schema mappings
- Data Manipulation Language (DML): retrieval, insertion, deletion, modification of data
- Query Definition Language (QDL): defines queries (often encompassed within DML in modern systems)
Additional language families
- Data Control Language (DCL): controls access permissions and security
- Transaction Control Language (TCL): controls transactions (commit, rollback, etc.)
- Open/Standard interfaces: ODBC (Open Database Connectivity) and JDBC (Java Database Connectivity)
- VDL and QDL are often integrated into higher-level language features in modern systems
SDL vs. SQL and other DBMS languages
- SDL: internal schema details; multiple DBMS implementations may support different SDLs
- SQL: widely adopted master language that unifies many subsystems’ capabilities
- The textbook highlights a subset of languages; emphasis on SQL as a unifying standard
Common SDL/DDL/DML/VL tools and examples
- MySQL example: InnoDB as a default storage engine; support for multiple SDLs
DBMS user interfaces (UIs)
- Menu-based interfaces for Web clients or browsing
- Forms-based interfaces
- Graphical user interfaces (GUIs) such as SQL Server Management Studio (SSMS)
- Natural language interfaces
- Speech input/output interfaces
- Interfaces for parametric users (e.g., SQL parameterization for repeated queries)
- Interfaces for DBAs (administrative tooling and dashboards)
SQL injection example (security note)
- Demonstrates how unsanitized inputs can be exploited to alter or delete data
- Example dialogue shows an attacker injecting: "Robert'); DROP TABLE Students;--"
- Lesson: sanitize inputs and use prepared statements/parameterized queries; validate and escape inputs; use least-privilege database accounts
SQL Insertion Hack visuals (additional demonstrations)
- A tollbooth camera OCR example illustrating how unsanitized input can be used to delete data; reinforces input sanitization and secure coding practices

2.5 Centralized and Client/Server Architectures for DBMSs

Centralized DBMS architecture
- All DBMS functionality, application execution, and user interface processing occur on a single machine
- Historically associated with mainframes and dumb terminals; still widely used in some environments
Basic Client/Server Architectures
- Server roles include: file server, printer server, web server, email server, database server
- Client roles: provide user interfaces and local processing power; access server services
Two-tier client/server architectures
- Server handles query and transaction functionality related to SQL processing
- Client handles user interfaces and application programming
Open Database Connectivity (ODBC) and Java Database Connectivity (JDBC)
- ODBC: API that allows client applications to call the DBMS; both client and server need appropriate software
- JDBC: Java-specific API for accessing one or more DBMSs through a standard interface
Three-tier and n-tier architectures for Web-based access
- Application server or Web server: intermediate layer between client and database server; runs application programs and enforces business rules
- N-tier architectures: further divide layers into finer components to separate concerns
Service-Oriented Architecture (SOA)
- Systems composed of many services that can reside on the same or different machines
- Database service, business process services, calendar service, etc.
Three-tier architecture (example structure)
- Presentation layer: client GUI or web interface
- Business logic layer: application server or Web server
- Data layer: database server and DBMS
- Visual depiction often includes an application layer between the presentation layer and the database server

2.6 Classification of Database Management Systems

Classification by data model
- Relational data model
- Object data model
- Hierarchical and network models (legacy systems)
- Native XML DBMS
Classification by number of users
- Single-user
- Multiuser
Classification by number of sites
- Centralized
- Distributed
- Homogeneous (same DBMS across sites)
- Heterogeneous (different DBMSs or data sources across sites)
Classification by cost/licensing
- Open source
- Commercial/licensing variations
Classification by access path options
- General-purpose vs. special-purpose access paths
Summary highlights
- DBMS classifications consider data model, user count, site distribution, access paths, and cost

Summary (Key Takeaways)

The database approach offers data abstraction, multiple views, data sharing, and multiuser transactions, reducing redundancy compared to traditional file systems
Three-schema architecture supports data independence and allows evolving storage/implementation details without disrupting user views or applications
A range of languages and interfaces (DDL, SDL, DML, VDL, QDL, DCL, TCL, SQL, ODBC/JDBC) enable definition, manipulation, and control of data and access
Architecture choices (centralized vs. client/server vs. multi-tier) impact scalability, maintenance, and deployment strategies in real-world systems
Classification criteria (data model, users, sites, access paths, cost) help in selecting appropriate DBMS for a given domain