Chapter 2: Database System Concepts and Architecture - VOCABULARY flashcards
2.1 Data Models, Schemas, and Instances
- Characteristics of the database approach versus traditional file processing
- Database approach uses a single repository of data accessed by multiple users and applications through queries, transactions, and programs
- Traditional file processing builds separate files and programs for each application, causing data redundancy and maintenance inefficiencies
- Traditional File Processing Example
- A grade reporting office maintains files on students and grades; programs to print transcripts and enter new grades are part of the application
- An accounting office maintains files on student fees and payments; separate files and programs exist for these needs
- Consequence: data redundancy leads to wasted storage and duplicated maintenance work to keep data up to date
- Database Approach Example
- A single repository stores data defined once and accessed by many users via queries and programs
- Key advantages include shared data, reduced redundancy, and centralized management
- Main characteristics of the database approach (vs. file processing)
- Self-describing nature of a database system
- Insulation between programs and data; data abstraction
- Support for multiple views of the data
- Data sharing and multiuser transaction processing
- Data Models and Schemas
- Data abstraction: suppression of details about data organization/storage to highlight essential features for understanding
- Different users perceive data at different levels of detail through abstraction
- Data Model
- A collection of concepts that describe the structure of a database
- Basic operations: specify retrievals and updates on the database
- Dynamic aspect or behavior: allows the database designer to specify a set of valid operations on database objects
- Categories of Data Models
- High-level (conceptual) data models: close to how users perceive data
- Low-level (physical) data models: describe details of how data is stored on storage media
- Representational data models: easily understood by end users; similar to how data is organized on storage; hides many storage details but can be implemented directly
- Conceptual Data Models
- Conceptual types include relational data model (most common in traditional commercial DBMSs), object data model (newer family of higher-level implementation models, closer to conceptual models), and semantic data model (e.g., Semantic Web)
- Conceptual Features
- Entity: represents a real-world object or concept
- Relationship: an association among two or more entities
- Entity-Relationship (ER) model: models entities and their relationships
- Attribute: property of an entity; provides further description of an entity
- Physical Data Models
- Physical: describes how data is stored as files in the computer
- Access path: structure that makes search for particular records more efficient
- Index: example of an access path; allows direct access to data using an index term or keyword
- Schemas, Instances, and Database State
- Distinguish between the description of the database (schema) and the database itself (state)
- Schema Diagram Example (Figure 2.1 in Figure 1.2)
- Entities/tables and their attributes:
- STUDENT(Name, Student_number, Class, Major)
- COURSE(Coursename, Coursenumber, Credit_hours, Department)
- PREREQUISITE(Coursenumber, Prerequisitenumber)
- SECTION(Sectionidentifier, Coursenumber, Semester, Year, Instructor)
- GRADEREPORT(Studentnumber, Section_identifier, Grade)
- Schema changes and evolution
- Schema changes are usually needed as requirements change
- Newer DBMS provide operations to allow schema changes, though the process is more involved than simple data updates
- Notational notes about terminology
- It is customary to use schemas as the plural for schema, though schemata is the proper plural form
- The word scheme is sometimes used to refer to a schema
- Define key concepts
- Define a new database
- Specify database schema to the DBMS
- Initial state: populated or loaded with initial data
- Valid state: satisfies the structure and constraints specified by the schema
- Schema evolution: changes applied to the schema as application requirements change
- Figure 2.1 (Schema diagram) and column/field labeling visuals (as shown in slides)
- Practical takeaway
- Understanding schemas vs. instances helps in reasoning about data definitions vs. actual data at a point in time
2.2 Three-Schema Architecture and Data Independence
- Three-schema architecture overview
- Internal level: describes physical storage structure of the database
- Conceptual level: describes the structure of the whole database for a community of users
- External (view) level: describes part of the database that a particular user group is interested in
- Architecture mapping
- External level -> external/view schemas
- Conceptual level -> conceptual/internal mappings
- Internal level -> internal/storage schema
- External, conceptual, and internal levels (visual form)
- External Level: External View(s) tailored to user needs
- Conceptual Level: Conceptual Schema representing the overall database structure for all users
- Internal Level: Internal Schema describing physical storage details
- Mappings: External/Conceptual Mapping and Conceptual/Internal Mapping connect levels
- Data independence
- Capacity to change the schema at one level without changing the higher levels
- Types of data independence
- Logical data independence: ability to change the conceptual schema without changing external schemas or applications
- Physical data independence: ability to change the internal/physical schema without changing the conceptual schema
- Practical implication: allows evolving DBMS technology or storage hardware without rewriting applications
- Example implications
- Changing DBMS or disk drive can be done with logical/physical independence, preserving application interfaces and external views
2.3 Database Languages and Interfaces
- DBMS languages and their roles
- Data Definition Language (DDL): defines data and storage schemas
- Storage Definition Language (SDL): specifies the internal schema (how data is stored on disk)
- View Definition Language (VDL): specifies user views and views-to-conceptual-schema mappings
- Data Manipulation Language (DML): retrieval, insertion, deletion, modification of data
- Query Definition Language (QDL): defines queries (often encompassed within DML in modern systems)
- Additional language families
- Data Control Language (DCL): controls access permissions and security
- Transaction Control Language (TCL): controls transactions (commit, rollback, etc.)
- Open/Standard interfaces: ODBC (Open Database Connectivity) and JDBC (Java Database Connectivity)
- VDL and QDL are often integrated into higher-level language features in modern systems
- SDL vs. SQL and other DBMS languages
- SDL: internal schema details; multiple DBMS implementations may support different SDLs
- SQL: widely adopted master language that unifies many subsystems’ capabilities
- The textbook highlights a subset of languages; emphasis on SQL as a unifying standard
- Common SDL/DDL/DML/VL tools and examples
- MySQL example: InnoDB as a default storage engine; support for multiple SDLs
- DBMS user interfaces (UIs)
- Menu-based interfaces for Web clients or browsing
- Forms-based interfaces
- Graphical user interfaces (GUIs) such as SQL Server Management Studio (SSMS)
- Natural language interfaces
- Speech input/output interfaces
- Interfaces for parametric users (e.g., SQL parameterization for repeated queries)
- Interfaces for DBAs (administrative tooling and dashboards)
- SQL injection example (security note)
- Demonstrates how unsanitized inputs can be exploited to alter or delete data
- Example dialogue shows an attacker injecting: "Robert'); DROP TABLE Students;--"
- Lesson: sanitize inputs and use prepared statements/parameterized queries; validate and escape inputs; use least-privilege database accounts
- SQL Insertion Hack visuals (additional demonstrations)
- A tollbooth camera OCR example illustrating how unsanitized input can be used to delete data; reinforces input sanitization and secure coding practices
2.5 Centralized and Client/Server Architectures for DBMSs
- Centralized DBMS architecture
- All DBMS functionality, application execution, and user interface processing occur on a single machine
- Historically associated with mainframes and dumb terminals; still widely used in some environments
- Basic Client/Server Architectures
- Server roles include: file server, printer server, web server, email server, database server
- Client roles: provide user interfaces and local processing power; access server services
- Two-tier client/server architectures
- Server handles query and transaction functionality related to SQL processing
- Client handles user interfaces and application programming
- Open Database Connectivity (ODBC) and Java Database Connectivity (JDBC)
- ODBC: API that allows client applications to call the DBMS; both client and server need appropriate software
- JDBC: Java-specific API for accessing one or more DBMSs through a standard interface
- Three-tier and n-tier architectures for Web-based access
- Application server or Web server: intermediate layer between client and database server; runs application programs and enforces business rules
- N-tier architectures: further divide layers into finer components to separate concerns
- Service-Oriented Architecture (SOA)
- Systems composed of many services that can reside on the same or different machines
- Database service, business process services, calendar service, etc.
- Three-tier architecture (example structure)
- Presentation layer: client GUI or web interface
- Business logic layer: application server or Web server
- Data layer: database server and DBMS
- Visual depiction often includes an application layer between the presentation layer and the database server
2.6 Classification of Database Management Systems
- Classification by data model
- Relational data model
- Object data model
- Hierarchical and network models (legacy systems)
- Native XML DBMS
- Classification by number of users
- Classification by number of sites
- Centralized
- Distributed
- Homogeneous (same DBMS across sites)
- Heterogeneous (different DBMSs or data sources across sites)
- Classification by cost/licensing
- Open source
- Commercial/licensing variations
- Classification by access path options
- General-purpose vs. special-purpose access paths
- Summary highlights
- DBMS classifications consider data model, user count, site distribution, access paths, and cost
Summary (Key Takeaways)
- The database approach offers data abstraction, multiple views, data sharing, and multiuser transactions, reducing redundancy compared to traditional file systems
- Three-schema architecture supports data independence and allows evolving storage/implementation details without disrupting user views or applications
- A range of languages and interfaces (DDL, SDL, DML, VDL, QDL, DCL, TCL, SQL, ODBC/JDBC) enable definition, manipulation, and control of data and access
- Architecture choices (centralized vs. client/server vs. multi-tier) impact scalability, maintenance, and deployment strategies in real-world systems
- Classification criteria (data model, users, sites, access paths, cost) help in selecting appropriate DBMS for a given domain