Unit 1 – Introduction to Database Management Systems
Background
- A database is an organized collection of logically-related data that describes a specific enterprise or domain.
- Database Management System (DBMS) = Database + set of pre-written programs that store, update, retrieve, secure, and manage the data.
- Accepts requests from application programs, instructs the operating system to move the correct data, and enforces protection, security, and consistency (especially for multi-user environments).
- Widely used products include MySQL, Oracle, SQL Server, PostgreSQL, MongoDB, Sybase, Informix, etc.
Core Terminology
- Data – raw facts that can be captured in text, numbers, images, audio, video, graphs, or document files. Example: name, age, gender, weight.
- Information – data that have been processed/organized so that they reveal meaning and aid decision making.
- Metadata – “data about data”; e.g., field name, data type, length, min/max values, description, constraints; usually kept in a central repository (system catalog).
- Database – the overall collection of inter-related data of any size or complexity; structured to be easily stored, manipulated, and retrieved.
- Data warehouse / data mart – separate historical & summarized database designed for reporting and analytical processing.
- Field (Data Item) – single character or group of characters with specific meaning; e.g., customer_id, city.
- Record (Tuple) – collection of logically related fields; e.g., (id, name, address, city).
- Table / Relation – collection of related records; columns are Fields/Attributes/Domains, rows are Tuples/Records.
Purpose of a DBMS
- Replaces early file-processing systems where each application owned its own data files.
- Provides unified environment for insertion, update, deletion, and retrieval without writing a new low-level program for every task.
Shortcomings of Conventional File Processing (Pre-DBMS)
- Data redundancy → duplicated values, wasted storage, slower access.
- Data inconsistency → different copies disagree (e.g., changed employee address appears in HR but not Payroll).
- Data isolation → data scattered across formats/files; new programs must know every format.
- Difficult data access → no high-level query capability.
- Integrity problems → hard to enforce business rules or domain constraints across many files; risks to physical & logical integrity.
- Atomicity problems → partial updates if program crashes (violates the "all-or-nothing" rule).
- Concurrent-access anomalies → unsupervised simultaneous updates produce wrong results (e.g., two withdrawals on same account leave balance 4000 or 3000 when correct is 2000).
- Security problems → ad-hoc programs make fine-grained access control almost impossible.
Characteristics of Database Systems
- Model a slice of the real world and preserve logical relationships among data.
- Provide multiple views of the same data for different stakeholders.
- Support efficient and convenient insertion, deletion, and modification while maintaining integrity, atomicity, concurrency control, and security.
Advantages of DBMS
- Eliminates redundancy; centralizes data definition.
- Powerful query languages allow flexible retrieval.
- Logical isolation of data into separate tables improves organization.
- Built-in integrity constraints uphold accuracy.
- Enforces atomic, consistent, isolated, durable (ACID) transactions.
- Supports synchronized concurrent access by many users.
- Fine-grained security policies restrict who sees what.
Disadvantages of DBMS
- Complex design and time-consuming initial setup.
- Significant hardware/software investment; maintenance cost.
- Failure or corruption in one component can compromise the entire database.
- High conversion cost when migrating from legacy file systems.
- Staff must be trained (DBA, developers, power users).
Database System Applications (Typical Domains)
- Airlines/Railways – reservations, timetable display.
- Banking – accounts, loans, ATM & online transactions.
- Education – course registration, results, student services.
- Telecommunications – network inventory, call detail records, billing.
- Credit-card processing – purchase tracking, statement generation.
- E-commerce – product catalog integration, orders, payments.
- Health Care – electronic medical/health records, patient management.
Data Models (Logical Models)
- Hierarchical Model
- Tree structure starting from a single root; each child has one parent.
- Supports one-to-many relationships (e.g., department → courses → students).
- Network Model
- General graph structure; records can have multiple parents (many-to-many capability); faster traversal than hierarchy; dominated pre-relational era.
- Entity-Relationship (E-R) Model
- Conceptual design approach: split reality into entities (objects) and attributes (properties); connect entities with relationships.
- Good for high-level design then mapped to relational tables.
- Relational Model (E. F. Codd, 1970)
- Data organized as two-dimensional tables (relations); relationships maintained by common keys/fields; mathematically founded on set theory & first-order predicate logic.
- Object-Oriented Model
- Treats data as objects grouped into classes with attributes & methods, supporting encapsulation, inheritance, reusability.
Database Architectures
- One-tier – user sits directly at DBMS console; any change goes straight to database (used mainly by DBAs for admin, prototyping).
- Two-tier (Client–Server)
- GUI or application logic on client; DBMS engine on server; direct network connection for queries/transactions; server handles query processing & transaction management.
- Three-tier
- Adds application (middle) tier → clients talk to app server; app server talks to database. Isolation improves scalability, security, and is common in large web systems.
ANSI–SPARC Three-Level Data Abstraction
- Physical / Internal Level
- Lowest level; describes actual storage structures, indexes, file organization; chosen & tuned by DBA.
- Conceptual / Logical Level
- Middle level; describes entire database in terms of logical tables, relationships, constraints; hides physical details.
- External / View Level
- Highest level; multiple user-specific views or subsets; shields users from logical & physical complexity.
Mapping Between Levels
- Conceptual ↔ Internal Mapping – couples conceptual schema to physical storage; must change if storage layout changes.
- External ↔ Conceptual Mapping – couples each user view to the conceptual schema; multiple such mappings coexist.
Data Independence
- Ability to change one schema level without impacting the next higher level.
- Physical Data Independence – internal changes (e.g., new indexes, file splits) do not affect conceptual schema.
- Logical Data Independence – conceptual changes (e.g., splitting table, adding attribute) do not force changes to external views.
DBMS Functional Components
- Query Processor Units
- DDL Interpreter – converts DDL statements to metadata tables.
- DML Compiler – translates DML statements to low-level execution plan.
- Embedded DML Pre-compiler – turns DML embedded in host languages into host procedure calls.
- Query Evaluation Engine – executes compiled plans.
- Storage Manager Units
- Authorization Manager – checks user privileges.
- Integrity Manager – enforces constraints.
- Transaction Manager – guarantees atomicity & concurrency control.
- File Manager – allocates/deallocates disk space.
- Buffer Manager – pages data between disk and RAM.
- Supporting Data Structures
- Data Files – user data.
- Data Dictionary / System Catalog – metadata (heavily accessed).
- Indices – secondary structures for fast access.
- Statistical Data – histograms, selectivity stats for query optimizer.
Database Languages
| Category | Purpose | Typical Keywords |
|---|
| Data Definition Language (DDL) | Define or alter schema | CREATE, ALTER, DROP, TRUNCATE, RENAME, COMMENT |
| Data Manipulation Language (DML) | Retrieve & modify data | SELECT, INSERT, UPDATE, DELETE, MERGE |
| Data Control Language (DCL) | Control access | GRANT, REVOKE |
| Transaction Control Language (TCL) | Manage transactions | COMMIT, ROLLBACK, SAVEPOINT, SET AUTOCOMMIT |
- Two DML “styles”: Procedural (specify what and how) vs Declarative (specify only what; optimizer decides how).
Database Users
- Application Programmers – professionals writing code (Java, Python, etc.) that embeds SQL; build GUIs, APIs.
- Sophisticated Users / Analysts – interact through query languages, report writers, analytics tools; no coding.
- Specialized Users – create non-traditional DB apps (e.g., scientific, GIS); often power users or DBAs themselves.
- Naïve / End Users – use existing applications (ATMs, web forms); oblivious to database internals.
Database Administrator (DBA) – Key Responsibilities
- Schema Definition – design logical schema that satisfies organizational data requirements.
- Storage Structure & Access Methods – choose file organizations, indexing, clustering, partitioning.
- Security & Integrity Constraints – define roles, privileges, domain checks, referential rules.
- Authorization Management – grant/revoke permissions; enforce least-privilege principle.
- Liaison with Users – provide external schemas, answer ad-hoc data needs.
- Assisting Application Programmers – offer design guidelines, PL/SQL procedures, tuning tips.
- Performance Monitoring & Tuning – track query response, resource utilization; adjust physical or logical design.
- Backup & Recovery – implement regular backups (tape, cloud) and disaster-recovery strategies to restore service after failures.