Unit 1 – Introduction to Database Management Systems

Background
  • A database is an organized collection of logically-related data that describes a specific enterprise or domain.
  • Database Management System (DBMS) = Database + set of pre-written programs that store, update, retrieve, secure, and manage the data.
    • Accepts requests from application programs, instructs the operating system to move the correct data, and enforces protection, security, and consistency (especially for multi-user environments).
    • Widely used products include MySQL, Oracle, SQL Server, PostgreSQL, MongoDB, Sybase, Informix, etc.
Core Terminology
  • Data – raw facts that can be captured in text, numbers, images, audio, video, graphs, or document files. Example: name, age, gender, weight.
  • Information – data that have been processed/organized so that they reveal meaning and aid decision making.
  • Metadata – “data about data”; e.g., field name, data type, length, min/max values, description, constraints; usually kept in a central repository (system catalog).
  • Database – the overall collection of inter-related data of any size or complexity; structured to be easily stored, manipulated, and retrieved.
  • Data warehouse / data mart – separate historical & summarized database designed for reporting and analytical processing.
  • Field (Data Item) – single character or group of characters with specific meaning; e.g., customer_id, city.
  • Record (Tuple) – collection of logically related fields; e.g., (id, name, address, city).
  • Table / Relation – collection of related records; columns are Fields/Attributes/Domains, rows are Tuples/Records.
Purpose of a DBMS
  • Replaces early file-processing systems where each application owned its own data files.
  • Provides unified environment for insertion, update, deletion, and retrieval without writing a new low-level program for every task.
Shortcomings of Conventional File Processing (Pre-DBMS)
  • Data redundancy → duplicated values, wasted storage, slower access.
  • Data inconsistency → different copies disagree (e.g., changed employee address appears in HR but not Payroll).
  • Data isolation → data scattered across formats/files; new programs must know every format.
  • Difficult data access → no high-level query capability.
  • Integrity problems → hard to enforce business rules or domain constraints across many files; risks to physical & logical integrity.
  • Atomicity problems → partial updates if program crashes (violates the "all-or-nothing" rule).
  • Concurrent-access anomalies → unsupervised simultaneous updates produce wrong results (e.g., two withdrawals on same account leave balance 40004000 or 30003000 when correct is 20002000).
  • Security problems → ad-hoc programs make fine-grained access control almost impossible.
Characteristics of Database Systems
  • Model a slice of the real world and preserve logical relationships among data.
  • Provide multiple views of the same data for different stakeholders.
  • Support efficient and convenient insertion, deletion, and modification while maintaining integrity, atomicity, concurrency control, and security.
Advantages of DBMS
  • Eliminates redundancy; centralizes data definition.
  • Powerful query languages allow flexible retrieval.
  • Logical isolation of data into separate tables improves organization.
  • Built-in integrity constraints uphold accuracy.
  • Enforces atomic, consistent, isolated, durable (ACID) transactions.
  • Supports synchronized concurrent access by many users.
  • Fine-grained security policies restrict who sees what.
Disadvantages of DBMS
  • Complex design and time-consuming initial setup.
  • Significant hardware/software investment; maintenance cost.
  • Failure or corruption in one component can compromise the entire database.
  • High conversion cost when migrating from legacy file systems.
  • Staff must be trained (DBA, developers, power users).
Database System Applications (Typical Domains)
  • Airlines/Railways – reservations, timetable display.
  • Banking – accounts, loans, ATM & online transactions.
  • Education – course registration, results, student services.
  • Telecommunications – network inventory, call detail records, billing.
  • Credit-card processing – purchase tracking, statement generation.
  • E-commerce – product catalog integration, orders, payments.
  • Health Care – electronic medical/health records, patient management.
Data Models (Logical Models)
  1. Hierarchical Model
    • Tree structure starting from a single root; each child has one parent.
    • Supports one-to-many relationships (e.g., department → courses → students).
  2. Network Model
    • General graph structure; records can have multiple parents (many-to-many capability); faster traversal than hierarchy; dominated pre-relational era.
  3. Entity-Relationship (E-R) Model
    • Conceptual design approach: split reality into entities (objects) and attributes (properties); connect entities with relationships.
    • Good for high-level design then mapped to relational tables.
  4. Relational Model (E. F. Codd, 19701970)
    • Data organized as two-dimensional tables (relations); relationships maintained by common keys/fields; mathematically founded on set theory & first-order predicate logic.
  5. Object-Oriented Model
    • Treats data as objects grouped into classes with attributes & methods, supporting encapsulation, inheritance, reusability.
Database Architectures
  • One-tier – user sits directly at DBMS console; any change goes straight to database (used mainly by DBAs for admin, prototyping).
  • Two-tier (Client–Server)
    • GUI or application logic on client; DBMS engine on server; direct network connection for queries/transactions; server handles query processing & transaction management.
  • Three-tier
    • Adds application (middle) tier → clients talk to app server; app server talks to database. Isolation improves scalability, security, and is common in large web systems.
ANSI–SPARC Three-Level Data Abstraction
  1. Physical / Internal Level
    • Lowest level; describes actual storage structures, indexes, file organization; chosen & tuned by DBA.
  2. Conceptual / Logical Level
    • Middle level; describes entire database in terms of logical tables, relationships, constraints; hides physical details.
  3. External / View Level
    • Highest level; multiple user-specific views or subsets; shields users from logical & physical complexity.
Mapping Between Levels
  • Conceptual ↔ Internal Mapping – couples conceptual schema to physical storage; must change if storage layout changes.
  • External ↔ Conceptual Mapping – couples each user view to the conceptual schema; multiple such mappings coexist.
Data Independence
  • Ability to change one schema level without impacting the next higher level.
  1. Physical Data Independence – internal changes (e.g., new indexes, file splits) do not affect conceptual schema.
  2. Logical Data Independence – conceptual changes (e.g., splitting table, adding attribute) do not force changes to external views.
DBMS Functional Components
  • Query Processor Units
    • DDL Interpreter – converts DDL statements to metadata tables.
    • DML Compiler – translates DML statements to low-level execution plan.
    • Embedded DML Pre-compiler – turns DML embedded in host languages into host procedure calls.
    • Query Evaluation Engine – executes compiled plans.
  • Storage Manager Units
    • Authorization Manager – checks user privileges.
    • Integrity Manager – enforces constraints.
    • Transaction Manager – guarantees atomicity & concurrency control.
    • File Manager – allocates/deallocates disk space.
    • Buffer Manager – pages data between disk and RAM.
  • Supporting Data Structures
    • Data Files – user data.
    • Data Dictionary / System Catalog – metadata (heavily accessed).
    • Indices – secondary structures for fast access.
    • Statistical Data – histograms, selectivity stats for query optimizer.
Database Languages
CategoryPurposeTypical Keywords
Data Definition Language (DDL)Define or alter schemaCREATE, ALTER, DROP, TRUNCATE, RENAME, COMMENT
Data Manipulation Language (DML)Retrieve & modify dataSELECT, INSERT, UPDATE, DELETE, MERGE
Data Control Language (DCL)Control accessGRANT, REVOKE
Transaction Control Language (TCL)Manage transactionsCOMMIT, ROLLBACK, SAVEPOINT, SET AUTOCOMMIT
  • Two DML “styles”: Procedural (specify what and how) vs Declarative (specify only what; optimizer decides how).
Database Users
  1. Application Programmers – professionals writing code (Java, Python, etc.) that embeds SQL; build GUIs, APIs.
  2. Sophisticated Users / Analysts – interact through query languages, report writers, analytics tools; no coding.
  3. Specialized Users – create non-traditional DB apps (e.g., scientific, GIS); often power users or DBAs themselves.
  4. Naïve / End Users – use existing applications (ATMs, web forms); oblivious to database internals.
Database Administrator (DBA) – Key Responsibilities
  • Schema Definition – design logical schema that satisfies organizational data requirements.
  • Storage Structure & Access Methods – choose file organizations, indexing, clustering, partitioning.
  • Security & Integrity Constraints – define roles, privileges, domain checks, referential rules.
  • Authorization Management – grant/revoke permissions; enforce least-privilege principle.
  • Liaison with Users – provide external schemas, answer ad-hoc data needs.
  • Assisting Application Programmers – offer design guidelines, PL/SQL procedures, tuning tips.
  • Performance Monitoring & Tuning – track query response, resource utilization; adjust physical or logical design.
  • Backup & Recovery – implement regular backups (tape, cloud) and disaster-recovery strategies to restore service after failures.