Database Management Systems – Unit 1 Notes

DBMS = Database + Management System
- Database: collection of inter-related data.
- Management System: set of programs that store & retrieve data efficiently.
Formal definition: “A DBMS is a collection of inter-related data and a set of programs to store and access those data in an easy and effective manner.”
Core purpose: manage data for organisations (e.g., University storing students, teachers, courses, books, etc.)

Storage efficiency
- Redundant (duplicate) data removed → occupies less space.
- Example: A person with two bank accounts stored once instead of twice.
Fast retrieval
- Structured querying provides quick access.
Basic data-management operations supported
- Add, store, update, delete, and retrieve data.

Data redundancy → higher storage cost & slower access.
Data inconsistency (mismatched copies after updates).
Data isolation (scattered files, varied formats → complex coding for retrieval).
Difficulty in ad-hoc access; every new requirement needs a new program.
Application-data dependency (file changes ⇒ program changes).
Integrity problems: enforcing complex constraints across files is hard.
Atomicity issues: hard to ensure “all-or-nothing” execution (e.g., fund transfer).
Concurrency anomalies: uncontrolled simultaneous updates yield inconsistency.
Security limitations: fine-grained user access difficult.

Internal Schema = Physical Schema (maps conceptual schema to storage structures).
Conceptual Schema = Logical Schema (overall logical design).
External Schema = View Schema (user-specific views).
Objectives
- Users share one database yet see customized views.
- Users unaffected by storage details.
- DBA can change storage without altering user views.
- Physical changes should not disturb logical structure.

Schema: formal design/blueprint (structure, rules, relationships).
- Three kinds: Physical, Logical, View schemas.
- Relatively stable; rarely changes.
- Analogy: standard-house plan or table template.
Instance: actual data stored at a specific moment.
- Highly dynamic; changes with every insert/delete/update.
- Analogy: individual built houses or data-filled table rows.
Question answered: It is the instance that changes frequently, not the schema.

Ability to change one schema level without affecting higher levels.
Two flavours:
1. Physical Data Independence
- Modify physical storage (e.g., new indexes, file organisation, SSD swap) without changing logical or external schemas.
- Enables performance tuning.
1. Logical Data Independence
- Modify logical schema (e.g., add/remove attributes, apply constraints) without altering external views or application code.
Visual summary (as presented):
$\text{VIEW LEVEL} \xleftrightarrow{\text{Logical Independence}} \text{LOGICAL LEVEL} \xleftrightarrow{\text{Physical Independence}} \text{PHYSICAL LEVEL}$
Importance: supports evolution of DBMS while protecting applications and users from cascading changes.

In practice, SQL integrates all components, but conceptually we distinguish four sub-languages.

  CREATE TABLE instructor (
      ID         CHAR(5),
      name       VARCHAR(20),
      dept_name  VARCHAR(20),
      salary     NUMERIC(8,2)
  );

DDL compiler produces templates stored in data dictionary (metadata repository) containing schema, integrity constraints, authorizations.

Data abstraction ≈ using a remote control to operate a TV without knowing internal circuits.
Schema ≈ blueprint of the TV showing all components.
Schema vs. Instance: blueprint (schema) vs. actual built houses (instances).

Security & privacy: DBMS provides fine-grained access control (students cannot read teacher payroll).
Integrity constraints: ensures business rules (e.g., department account balance ≥ 0).
Concurrency control: prevents anomalies such as two withdrawals reading same balance (though ACID details covered in later units, concurrency motivation stated in file-system drawbacks).

Relational design theory, normalization (1NF ⇒ BCNF ⇒ 4NF) builds on the logical schema concepts introduced here.
SQL forms the foundation for Intermediate & Advanced SQL in Unit-2.
ACID, serializability, and NoSQL topics in Unit-3 rely on understanding data independence and abstraction layers.