Database Systems Notes
1.0 Introduction to Database Management
- Definition of a Database (DB)
• Organized collection of information for easy access, update and management
• Electronic system enabling manipulation & update of data - Definition of a Database Management System (DBMS)
• Software layer to create, store, process, analyze & secure databases
• Provides GUI/CLI tools to execute operations (create tables, insert/update/delete, backup, recovery)
• Ensures protection, security & consistency in multi-user environments
1.1 Popular DBMS Products
- MySQL – open-source RDBMS founded 1995; acquired by Sun (2008) → Oracle (2010); valued for , reliability & low cost
- MS Access – desktop database (Jet engine) with GUI; personal & SMB usage
- Oracle Database – 4th-gen RDBMS for large enterprises; features: integrity constraints, triggers, shared SQL, locking
- IBM DB2 – relational (with OO extensions); high cost but scalable
- MS SQL Server – Microsoft RDBMS on Windows; network & internet access
- FileMaker – cross-platform RDBMS (Win/Mac) with drag-and-drop UI; strong security
- NoSQL – “Not Only SQL”; non-relational, distributed data stores (e.g., Google, Facebook) for ingestion
- PostgreSQL – open-source ORDBMS on Linux/Windows/Solaris
- FoxPro – mix of DBMS/RDBMS; supports multiple DBF relationships but lacks transactions
1.2 Characteristics of a DBMS
- Data stored in tables with relationships
- Reduced redundancy via normalization
- Data consistency on live (concurrent) data
- Concurrent multi-user support
- Query language (e.g., SQL)
- Security & access control
- Transaction support (ACID integrity)
Schema & Architecture
- Database Schema – logical skeleton of DB; defines entities, attributes, constraints
• Physical schema: files, indexes, storage structure
• Logical schema: tables, views, integrity rules - Three-Level Architecture
• Physical (Internal) – bit/files on disk
• Conceptual – overall logical structure
• External – user views
• Mappings translate requests between levels - Data Independence
• Physical independence – change storage minimal logical change
• Logical independence – change logical minimal external change
Historical Evolution of DBMS
- Navigational era – hierarchical (IMS) & network (CODASYL) models
- Relational era (SQL) – proposed 1970; mass adoption mid-1980s; dominance 1990s-present
- Post-relational / Object DB – 1980s object databases; object-relational hybrids
- NoSQL (2000s) – key-value, document, column stores; horizontal scaling
- NewSQL – retains SQL/relational with NoSQL-like performance
1.3 Traditional File System vs DBMS Approach
File-based System Drawbacks
- Data redundancy & inconsistency
- Isolation of data; poor sharing
- Tight data dependence on programs; costly program maintenance
- No concurrency, security, atomicity, standard query
DBMS Advantages over Files
- Controlled redundancy
- Consistency & integrity
- Shared centralized data
- Security & access rules
- Standards enforcement
- Economies of scale
- Conflict resolution, better accessibility
- Data independence & easier maintenance
- Concurrency & recovery
DBMS Disadvantages
- Complexity, large size, performance overhead, single-point failures, high software & conversion cost
1.4 DBMS Components
- Hardware – disks, controllers, servers, DB machines
- Software – DBMS, utilities, report writers
- Data – operational + metadata
- Users
• Naïve (ATM)
• Online
• Sophisticated (SQL)
• Specialized (CAD, multimedia)
• Application programmers
• Stand-alone & native users - Procedures – login, backup, recovery, reorg, performance tuning
1.5 Classification of Database Systems
- By Data Model: Relational, Object, Object-Relational, Hierarchical, Network
- By Users: Single-user vs Multi-user
- By Site Distribution: Centralized, Parallel, Distributed
- By Usage:
• OLTP – high-volume short transactions
• OLAP – analytical, complex queries
• Big Data / NoSQL
• Multimedia DBMS
2.0 Database Organization Styles
Centralized Database
- Stored & maintained at single site
- Pros: max integrity, minimal redundancy, stronger security, low cost, easier admin, simultaneous access
- Cons: network dependence, bottlenecks, limited concurrent access, single failure risk
Client–Server Architecture
- Split into clients (requesters) & servers (providers)
- Pros: central control, backup, scalability, remote access, security
- Cons: network congestion, server failure stops ops, higher cost vs P2P
Two-tier vs Three-tier
• 2-tier: Presentation + Data tiers; tight coupling
• 3-tier: Presentation / Application / Data; horizontal scaling, caching, security
Distributed DB Systems
- Multiple interconnected DBs across sites
- Types: Homogeneous (same DBMS), Heterogeneous (federated/un-federated)
- Features: local autonomy, network communication, transparency
- Pros: modular growth, reliability, local response, lower comms cost
- Cons: complex software, processing overhead, integrity issues, poor design penalties
3.0 Principles & Techniques of Database Design
Meaning
- Arrangement of data into a chosen model ensuring integrity & efficiency
Design/Development Cycle
- Planning – approve request, prioritize, allocate resources, form team
- Requirements Analysis – gather & document user needs
- Design – conceptual schema + transaction/app specs
- DBMS Selection – technical & economic factors
- Implementation – create definitions, apps, populate, test
- Maintenance – monitor, tune, evolve schema
4.0 Relational Database System
Definition
- RDBMS implements Codd’s relational model; data stored in tables (relations)
Core Characteristics
- Table-based, column/row access, NULL support, catalog, SQL language, view updatability, set ops, logical & physical independence, integrity constraints stored in catalog, distribution independence
Relational Data Model Elements
- Relation (Table) – subset of Cartesian product
- Tuple (Row), Attribute (Column), Domain (atomic value set), Degree (#attributes)
- Schema – physical & logical descriptions
Integrity Rules
- Domain integrity – data type, size, default, rules
- Entity integrity – primary key unique & non-NULL
- Referential integrity – FK matches PK or ; actions: CASCADE, SET NULL, RESTRICT
5.0 Entity–Relationship (ER) Modeling
Purpose
- Graphically depict entities, attributes, relationships for conceptual design
Notation Basics
- Rectangle = Entity (Strong / Weak)
- Oval = Attribute (Key, Composite, Multivalued, Derived)
- Diamond = Relationship
- Lines w/ symbols denote cardinality & ordinality:
• , , ; optional vs mandatory
Relationship Degree & Conventions
- Unary, Binary, Ternary, N-ary; verbs for relationships, nouns for entities
Steps to Draw ERD
- Identify entities
- Remove duplicates / non-entities
- List attributes & PKs
- Define relationships
- Specify cardinality/optionality
- Check redundancy & iterate
6.0 Normalization
Goal & Importance
- Restructure relations to minimize redundancy & anomalies; improve integrity & flexibility
- Codd’s objectives: eliminate anomalies, reduce restructuring, enrich semantics, query neutrality
Anomalies Without Normalization
- Insert, Update, Delete anomalies due to repeating groups & partial dependencies
Armstrong’s Axioms (FD inference)
- Reflexivity: if
- Augmentation:
- Transitivity:
Functional Dependencies
- : attribute Y determined by X; determinant vs dependent
- Full FD (no proper subset determines), Partial FD, Transitive FD
Normal Forms
- 1NF – atomic values; no repeating groups
- 2NF – 1NF + no partial FD of non-key on composite PK
- 3NF – 2NF + no transitive FD of non-key on PK
- BCNF – every FD , X is super-key
- Higher NFs (4NF, 5NF) handle multivalued & join dependencies
- Denormalization – deliberate redundancy for performance
7.0 Querying a Database (SQL)
SQL Categories
- DDL –
- DML –
- DCL –
- TCL –
Data Types (MySQL Sample)
- Text: CHAR, VARCHAR, TINYTEXT …
- Number: TINYINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL
- Date/Time: DATE, DATETIME, TIMESTAMP, TIME, YEAR
Schema Commands
- (columns + types)
- – ADD / DROP / MODIFY column
Constraints
- NOT NULL, UNIQUE, PRIMARY KEY, FOREIGN KEY, CHECK, DEFAULT, INDEX
DML Examples
- INSERT INTO table (col1,…) VALUES (…)
- SELECT col FROM table
• DISTINCT, WHERE with operators (=, <>, >, <, BETWEEN, LIKE, IN)
• Logical AND, OR, NOT
• ORDER BY ASC/DESC - UPDATE table SET col=val WHERE …
- DELETE FROM table WHERE …
Aggregates & Patterns
- COUNT(), SUM(), AVG(), MIN(), MAX()
- Wildcards: % (multi), _ (single) ; charlists [ ] in Access/SQL Server
8.0 Database Security Threats & Risks
- Malware/Viruses – steal, corrupt, delete data
- Natural Disasters – destroy infrastructure; need off-site backups
- Disgruntled Employees – insider data leaks / sabotage
- Excessive Privileges & Legitimate Abuse – over-granted rights misused
- Injection Attacks – SQL & NoSQL injection via unsanitized input
- Storage Media Exposure – theft of backup tapes/disks
- Unpatched Vulnerabilities – OS/DBMS flaws exploited
- Weak Authentication & Audit Trails – brute-force, social engineering, undetected misuse
- Database Rootkits – hidden admin-level malware
- Human Error – misconfigurations causing breaches
9.0 Key Roles in DB Design & Administration
- Database Administrator (DBA) – install, configure, monitor, secure, backup, performance tune, license compliance
- Database Designer – define tables, indexes, views, triggers, storage parameters
- Database Analyst – study requirements, design physical DB, maintain dictionary, ensure confidentiality
- Database Developer – design & code DB systems (tables, procs, triggers), optimize performance, backups, security procedures
10.0 Reference Texts
- Silberschatz, Korth & Sudarshan – “Database System Concepts”
- Ullman – “Principles of Database Systems”
- Gillenson – “Fundamentals of DBMS”
- Coronel & Morris – “Database Principles: Fundamentals of Design”