Database Systems Notes

1.0 Introduction to Database Management

  • Definition of a Database (DB)
    • Organized collection of information for easy access, update and management
    • Electronic system enabling manipulation & update of data
  • Definition of a Database Management System (DBMS)
    • Software layer to create, store, process, analyze & secure databases
    • Provides GUI/CLI tools to execute operations (create tables, insert/update/delete, backup, recovery)
    • Ensures protection, security & consistency in multi-user environments

1.1 Popular DBMS Products

  • MySQL – open-source RDBMS founded 1995; acquired by Sun (2008) → Oracle (2010); valued for efficiency\text{efficiency}, reliability & low cost
  • MS Access – desktop database (Jet engine) with GUI; personal & SMB usage
  • Oracle Database – 4th-gen RDBMS for large enterprises; features: integrity constraints, triggers, shared SQL, locking
  • IBM DB2 – relational (with OO extensions); high cost but scalable
  • MS SQL Server – Microsoft RDBMS on Windows; network & internet access
  • FileMaker – cross-platform RDBMS (Win/Mac) with drag-and-drop UI; strong security
  • NoSQL – “Not Only SQL”; non-relational, distributed data stores (e.g., Google, Facebook) for TB/day\text{TB}/\text{day} ingestion
  • PostgreSQL – open-source ORDBMS on Linux/Windows/Solaris
  • FoxPro – mix of DBMS/RDBMS; supports multiple DBF relationships but lacks transactions

1.2 Characteristics of a DBMS

  • Data stored in tables with relationships
  • Reduced redundancy via normalization
  • Data consistency on live (concurrent) data
  • Concurrent multi-user support
  • Query language (e.g., SQL)
  • Security & access control
  • Transaction support (ACID integrity)

Schema & Architecture

  • Database Schema – logical skeleton of DB; defines entities, attributes, constraints
    • Physical schema: files, indexes, storage structure
    • Logical schema: tables, views, integrity rules
  • Three-Level Architecture
    • Physical (Internal) – bit/files on disk
    • Conceptual – overall logical structure
    • External – user views
    Mappings translate requests between levels
  • Data Independence
    • Physical independence – change storage \rightarrow minimal logical change
    • Logical independence – change logical \rightarrow minimal external change

Historical Evolution of DBMS

  • Navigational era – hierarchical (IMS) & network (CODASYL) models
  • Relational era (SQL) – proposed 1970; mass adoption mid-1980s; dominance 1990s-present
  • Post-relational / Object DB – 1980s object databases; object-relational hybrids
  • NoSQL (2000s) – key-value, document, column stores; horizontal scaling
  • NewSQL – retains SQL/relational with NoSQL-like performance

1.3 Traditional File System vs DBMS Approach

File-based System Drawbacks

  • Data redundancy & inconsistency
  • Isolation of data; poor sharing
  • Tight data dependence on programs; costly program maintenance
  • No concurrency, security, atomicity, standard query

DBMS Advantages over Files

  1. Controlled redundancy
  2. Consistency & integrity
  3. Shared centralized data
  4. Security & access rules
  5. Standards enforcement
  6. Economies of scale
  7. Conflict resolution, better accessibility
  8. Data independence & easier maintenance
  9. Concurrency & recovery

DBMS Disadvantages

  • Complexity, large size, performance overhead, single-point failures, high software & conversion cost

1.4 DBMS Components

  • Hardware – disks, controllers, servers, DB machines
  • Software – DBMS, utilities, report writers
  • Data – operational + metadata
  • Users
    • Naïve (ATM)
    • Online
    • Sophisticated (SQL)
    • Specialized (CAD, multimedia)
    • Application programmers
    • Stand-alone & native users
  • Procedures – login, backup, recovery, reorg, performance tuning

1.5 Classification of Database Systems

  • By Data Model: Relational, Object, Object-Relational, Hierarchical, Network
  • By Users: Single-user vs Multi-user
  • By Site Distribution: Centralized, Parallel, Distributed
  • By Usage:
    • OLTP – high-volume short transactions
    • OLAP – analytical, complex queries
    • Big Data / NoSQL
    • Multimedia DBMS

2.0 Database Organization Styles

Centralized Database

  • Stored & maintained at single site
  • Pros: max integrity, minimal redundancy, stronger security, low cost, easier admin, simultaneous access
  • Cons: network dependence, bottlenecks, limited concurrent access, single failure risk

Client–Server Architecture

  • Split into clients (requesters) & servers (providers)
  • Pros: central control, backup, scalability, remote access, security
  • Cons: network congestion, server failure stops ops, higher cost vs P2P
Two-tier vs Three-tier

• 2-tier: Presentation + Data tiers; tight coupling
• 3-tier: Presentation / Application / Data; horizontal scaling, caching, security

Distributed DB Systems

  • Multiple interconnected DBs across sites
  • Types: Homogeneous (same DBMS), Heterogeneous (federated/un-federated)
  • Features: local autonomy, network communication, transparency
  • Pros: modular growth, reliability, local response, lower comms cost
  • Cons: complex software, processing overhead, integrity issues, poor design penalties

3.0 Principles & Techniques of Database Design

Meaning

  • Arrangement of data into a chosen model ensuring integrity & efficiency

Design/Development Cycle

  1. Planning – approve request, prioritize, allocate resources, form team
  2. Requirements Analysis – gather & document user needs
  3. Design – conceptual schema + transaction/app specs
  4. DBMS Selection – technical & economic factors
  5. Implementation – create definitions, apps, populate, test
  6. Maintenance – monitor, tune, evolve schema

4.0 Relational Database System

Definition

  • RDBMS implements Codd’s relational model; data stored in tables (relations)

Core Characteristics

  • Table-based, column/row access, NULL support, catalog, SQL language, view updatability, set ops, logical & physical independence, integrity constraints stored in catalog, distribution independence

Relational Data Model Elements

  • Relation (Table) – subset of Cartesian product D<em>1×D</em>2×DnD<em>1 \times D</em>2 \times … D_n
  • Tuple (Row), Attribute (Column), Domain (atomic value set), Degree (#attributes)
  • Schema – physical & logical descriptions

Integrity Rules

  • Domain integrity – data type, size, default, NULL\text{NULL} rules
  • Entity integrity – primary key unique & non-NULL
  • Referential integrity – FK matches PK or NULL\text{NULL}; actions: CASCADE, SET NULL, RESTRICT

5.0 Entity–Relationship (ER) Modeling

Purpose

  • Graphically depict entities, attributes, relationships for conceptual design

Notation Basics

  • Rectangle = Entity (Strong / Weak)
  • Oval = Attribute (Key, Composite, Multivalued, Derived)
  • Diamond = Relationship
  • Lines w/ symbols denote cardinality & ordinality:
    1:11:1, 1:N1:N, N:MN:M; optional vs mandatory

Relationship Degree & Conventions

  • Unary, Binary, Ternary, N-ary; verbs for relationships, nouns for entities

Steps to Draw ERD

  1. Identify entities
  2. Remove duplicates / non-entities
  3. List attributes & PKs
  4. Define relationships
  5. Specify cardinality/optionality
  6. Check redundancy & iterate

6.0 Normalization

Goal & Importance

  • Restructure relations to minimize redundancy & anomalies; improve integrity & flexibility
  • Codd’s objectives: eliminate anomalies, reduce restructuring, enrich semantics, query neutrality

Anomalies Without Normalization

  • Insert, Update, Delete anomalies due to repeating groups & partial dependencies

Armstrong’s Axioms (FD inference)

  • Reflexivity: if YXXYY \subseteq X \Rightarrow X \to Y
  • Augmentation: XYXZYZX \to Y \Rightarrow XZ \to YZ
  • Transitivity: XY,  YZXZX \to Y, \; Y \to Z \Rightarrow X \to Z

Functional Dependencies

  • XYX \to Y: attribute Y determined by X; determinant vs dependent
  • Full FD (no proper subset determines), Partial FD, Transitive FD

Normal Forms

  • 1NF – atomic values; no repeating groups
  • 2NF – 1NF + no partial FD of non-key on composite PK
  • 3NF – 2NF + no transitive FD of non-key on PK
  • BCNF – every FD XAX \to A, X is super-key
  • Higher NFs (4NF, 5NF) handle multivalued & join dependencies
  • Denormalization – deliberate redundancy for performance

7.0 Querying a Database (SQL)

SQL Categories

  • DDLCREATE,ALTER,DROP,TRUNCATE,COMMENT,RENAME\text{CREATE}, \text{ALTER}, \text{DROP}, \text{TRUNCATE}, \text{COMMENT}, \text{RENAME}
  • DMLSELECT,INSERT,UPDATE,DELETE\text{SELECT}, \text{INSERT}, \text{UPDATE}, \text{DELETE}
  • DCLGRANT,REVOKE\text{GRANT}, \text{REVOKE}
  • TCLCOMMIT,ROLLBACK,SAVEPOINT,SET TRANSACTION\text{COMMIT}, \text{ROLLBACK}, \text{SAVEPOINT}, \text{SET TRANSACTION}

Data Types (MySQL Sample)

  • Text: CHAR, VARCHAR, TINYTEXT …
  • Number: TINYINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL
  • Date/Time: DATE, DATETIME, TIMESTAMP, TIME, YEAR

Schema Commands

  • CREATE DATABASE / USE / DROP\text{CREATE DATABASE / USE / DROP}
  • CREATE TABLE\text{CREATE TABLE} (columns + types)
  • ALTER TABLE\text{ALTER TABLE} – ADD / DROP / MODIFY column
  • TRUNCATE,DROP TABLE\text{TRUNCATE}, \text{DROP TABLE}

Constraints

  • NOT NULL, UNIQUE, PRIMARY KEY, FOREIGN KEY, CHECK, DEFAULT, INDEX

DML Examples

  • INSERT INTO table (col1,…) VALUES (…)
  • SELECT col FROM table
    • DISTINCT, WHERE with operators (=, <>, >, <, BETWEEN, LIKE, IN)
    • Logical AND, OR, NOT
    • ORDER BY ASC/DESC
  • UPDATE table SET col=val WHERE …
  • DELETE FROM table WHERE …

Aggregates & Patterns

  • COUNT(), SUM(), AVG(), MIN(), MAX()
  • Wildcards: % (multi), _ (single) ; charlists [ ] in Access/SQL Server

8.0 Database Security Threats & Risks

  • Malware/Viruses – steal, corrupt, delete data
  • Natural Disasters – destroy infrastructure; need off-site backups
  • Disgruntled Employees – insider data leaks / sabotage
  • Excessive Privileges & Legitimate Abuse – over-granted rights misused
  • Injection Attacks – SQL & NoSQL injection via unsanitized input
  • Storage Media Exposure – theft of backup tapes/disks
  • Unpatched Vulnerabilities – OS/DBMS flaws exploited
  • Weak Authentication & Audit Trails – brute-force, social engineering, undetected misuse
  • Database Rootkits – hidden admin-level malware
  • Human Error – misconfigurations causing breaches

9.0 Key Roles in DB Design & Administration

  • Database Administrator (DBA) – install, configure, monitor, secure, backup, performance tune, license compliance
  • Database Designer – define tables, indexes, views, triggers, storage parameters
  • Database Analyst – study requirements, design physical DB, maintain dictionary, ensure confidentiality
  • Database Developer – design & code DB systems (tables, procs, triggers), optimize performance, backups, security procedures

10.0 Reference Texts

  • Silberschatz, Korth & Sudarshan – “Database System Concepts”
  • Ullman – “Principles of Database Systems”
  • Gillenson – “Fundamentals of DBMS”
  • Coronel & Morris – “Database Principles: Fundamentals of Design”