CS2304 Unit 1 Notes

Course Objectives

The course objectives of CS2304 Database Management Systems Unit 1 focus on building a solid foundation in modern database systems. They include: first, to introduce the core principles and architectures of modern database systems; second, to apply data modelling techniques using the Entity-Relationship (ER) and relational models; third, to design and normalize relational schemas for efficient data storage; fourth, to write effective SQL and PL/SQL programs for data manipulation and transaction control; fifth, to understand the concepts of query processing, optimization, and indexing; and sixth, to explore the role of NoSQL databases, Big Data systems, and distributed databases in modern applications.

Course Outcomes

Course outcomes are aligned with the objectives and emphasize practical modelling and implementation skills. They include: (1) designing and constructing conceptual database models using ER and EER diagrams for real-life applications; (2) transforming high-level data models into normalized relational schemas using functional dependencies and synthesis techniques; (3) applying normalization concepts to develop a quality relational data model; (4) formulating and executing queries using relational algebra, SQL, and developing procedural constructs using PL/SQL; (5) exploring and implementing modern database technologies such as NoSQL and Big Data frameworks like MongoDB and Hadoop; and (6) demonstrating understanding of physical database structures, indexing mechanisms, and query optimization techniques.

Books and References

Core textbooks include: (1) Abraham Silberschatz, Henry F. Korth, S. Sudarshan, Database System Concepts, 6th Edition, McGraw-Hill Education; (2) Ramez Elmasri, Shamkant B. Navathe, Fundamentals of Database Systems, 7th Edition, Pearson. Reference books include: (1) Connolly and Begg, Database Systems: A Practical Approach to Design, Implementation, and Management, 6th Edition, Pearson; (2) Raghu Ramakrishnan and Johannes Gehrke, Database Management Systems, 3rd Edition, McGraw Hill Education; (3) Kristina Chodorow, MongoDB: The Definitive Guide, O'Reilly, 2nd Edition, ISBN: 978-93-5110-269-4; (4) Dr. P. S. Deshpande, SQL and PL/SQL for Oracle 10g Black Book; (5) Ivan Bayross, SQL, PL/SQL: The Programming Language of Oracle; (6) Reese G., Yarger R., King T., Williums H, Managing and Using MySQL, Shroff; (7) Dalton Patrik, SQL Server – Black Book; (8) Eric Redmond, Jim Wilson, Seven Databases in Seven Weeks, SPD; (9) Jay Kreibich, Using SQLite, SPD.

Assessment Scheme

The transcript includes a heading for Assessment Scheme but does not provide detailed breakdowns. Expect typical university assessment components such as quizzes, assignments, tutorials/labs, mid-term and final exams, and possibly projects, with emphasis on ER modelling, relational schema design, SQL/PL/SQL programming, and database system concepts.

Unit 1 Content Introduction

Unit 1 covers the need for Database Management Systems, their evolution, database system concepts and architecture, and the database design process. It emphasizes data modeling including the Entity-Relationship (ER) Model, Extended ER Model (EER), Relational Model, and Codd's Rules. The aim is to establish a foundation for modeling, designing, implementing, and tuning database systems.

What is Data?

Data is a collection of distinct units of information that can take many forms such as text, numbers, media, and more. Data can be stored on paper or in electronic memory, and it is essentially information that can be translated into a form suitable for movement and processing. Data comprises raw facts and figures that lack meaning until processed and given context. Data come in various forms, typically alphanumeric, but may also include images or sounds.

What is Data? (continued)

Turning data into information requires processing within an organization. Processed data must be presented in a context that renders it meaningful to the recipient. The fundamental relation is: extINFORMATION=extDATA+extCONTEXT+extMEANING.ext{INFORMATION} = ext{DATA} + ext{CONTEXT} + ext{MEANING}. Information is data that has been processed within a context to give it meaning.

Data vs Information

Data are unprocessed facts such as numbers, words, or observations. Information is data that has been processed, organized, or structured to provide context, meaning, and relevance. Examples illustrate the distinction:

  • Data: "25, 30, 35" or "Red, Blue, Green" without context.

  • Information: "The average age of participants is 30 years" or "The most popular color among users is Blue." The purpose of information is to aid understanding, decision-making, and communication of insights.

Database Conceptualization

A database is an organized collection of data that is stored and managed to enable easy access, retrieval, and updating. Common analogies include the library as a database and books as data; a school registry as a single file containing student details. Types of databases include: RDBMS, NoSQL, Object-Oriented DB, Hierarchical DB, Network DB, Time Series DB, Cloud DB, and Graph DB. The goal of a database is to store large amounts of data efficiently and allow querying and manipulation.

Why Database?

Databases are used to operate large amounts of information by storing, retrieving, and managing data. They are suitable for linking different data sets such as students and grades, customers and sales, patients and doctors, banking transactions across accounts, and taxpayers with tax payments. Benefits include handling large data volumes, easy updates via DML such as SQL, and avoiding the limitations of spreadsheets for large-scale data management.

Access and Security in Databases

Access to data is facilitated by the ability to store large volumes in a single location, support multiple users, and enable fast search and sort operations. Databases implement security through user logins and privileges (read, write, etc.), allowing controlled access. They also support Data Query Languages (DQL) for searching and computing over data.

DBMS Features

A Database Management System (DBMS) enables users to create, manage, and interact with databases efficiently. It provides an interface between the database and users/applications, ensuring data organization, storage, retrieval, and security.
Key functions include: reducing redundancy, backup and recovery, multiple user interfaces (GUI, programmatic), and facilitating data sharing.

Problems with File Systems

File systems suffer from data redundancy and inconsistency due to multiple programs storing similar data in different formats. Accessing specific data can be difficult, leading to poor flexibility for querying. Data isolation occurs as data is spread across many files with different formats. Integrity, atomicity, and concurrency issues arise when consistency constraints, transaction atomicity, and concurrent updates are not properly managed. Security problems occur when unauthorized access is possible or when ad hoc programs bypass constraints.

Need for DBMS

A DBMS offers efficient data storage and retrieval with indexing and querying; data integrity and accuracy; security via authentication, authorization, and encryption; minimized data redundancy through centralized storage; concurrent access by multiple users without conflicts; and reliable data backup and recovery mechanisms.

Evolution of DBMS

DBMS evolution spans generations: early File Systems (1950s–1960s), Hierarchical and Network models (1960s–1970s), Relational Model (1970s–present), Object-Oriented DB (1980s–1990s) and No-SQL DBMS (2000s–present), culminating in NewSQL and Cloud-based DBs (2010s–present).

Database System Concepts and Architecture

Core concepts include: the database, DBMS, view of data (data abstraction), instances and schemas, and data models. The architecture defines levels at which the system operates and how users interact with it.

View of Data: Data Abstraction

Data Abstraction comprises three levels:

  • Physical level: describes how data are actually stored, detailing low-level data structures.

  • Logical level: describes what data are stored and the relationships among them, i.e., the database as a set of simple structures.

  • View level: describes only part of the database, simplifying user interaction by exposing only relevant data.

Instances and Schema

An instance refers to the actual data stored in the database at a moment in time. The schema defines the logical structure of the database, including tables, relationships, and constraints. There is also a physical schema that describes how data are stored on storage media and a logical schema that describes the logical view of data (tables, views).

Data Model

A data model defines how data are represented, stored, and organized. Common data models include: Relational Model (data in tables), Hierarchical Model (tree-like structures), Network Model (graph-like structures), and Object-Oriented Model (objects similar to OOP).

Client-Server Architecture

Modern DBMS often use a client-server architecture with a client application sending queries to the server and receiving results. The server processes requests, interacts with the database, and handles concurrency control. The DBMS runs on the server and provides essential services. Architectures are commonly classified as: (i) Two-Tier: direct client-server communication, (ii) Three-Tier: an application server layer processes logic before talking to the database server.

Detailed System Structure

A database system comprises modules for storage management and the query processor.

  • Storage Manager components include: Authorization and Integrity Manager (enforces permissions and integrity constraints), Transaction Manager (ensures consistency during failures and manages concurrent transactions), File Manager (allocates disk space and manages data structures), Buffer Manager (fetches data from disk to memory and caches data), Data Files (store the actual data), Data Dictionary (metadata about the schema), and Indices (fast data access).

  • The Query Processor includes: DDL interpreter (interprets DDL statements and updates the data dictionary), DML compiler (translates DML into an evaluation plan and performs query optimization to select a low-cost plan), and the Query Evaluation Engine (executes the plan).

Database Design Process

Database design begins with data modeling using ER/EER and Relational models, followed by phases that map concepts to implementation and tuning.

Phases of Database Design

The process is outlined as: Requirements Collection and Analysis; Conceptual Database Design; Choice of a DBMS; Data Model Mapping (Logical Database Design); Physical Database Design; Database System Implementation and Tuning. Each phase contributes to moving from abstract requirements to a functioning, optimized database system.

Approaches to Conceptual Schema Design

Two main approaches exist: (i) Centralized Schema Design (one-shot approach) where a single global schema is designed to meet all applications, which can be time-consuming and places a heavy burden on the DBA, and (ii) View Integration Approach where separate schemas are designed for each user group and later merged into a global conceptual schema. The latter is more practical.

Strategies for Schema Design

Two traditional strategies are discussed: Top-Down and Bottom-Up. In the Top-Down approach, one starts with a high-level schema and refines it; in the Bottom-Up approach, one starts with basic abstractions and then integrates them into a broader schema.

Choice of DBMS (Phase 3)

Choosing a DBMS involves several factors: technical (type of DBMS such as relational, object-relational, or object; storage structures; architectural options), economic (acquisition, maintenance, training, and operating costs; conversion costs), and organizational (philosophy, preference, staff familiarity, vendor services).

Data Model Mapping (Phase 4)

Data model mapping, also called logical database design, maps the conceptual schema from the high-level data model into the DBMS-specific data model once a DBMS type is chosen.

Physical Database Design (Phase 5)

This phase specifies stored database details in terms of physical storage structures, record placement, and indexes, corresponding to the internal schema of the three-level architecture.

Database System Implementation and Tuning (Phase 6)

The final phase covers implementing the database and application programs, testing them, deploying for service, and ongoing tuning, indexing, and data reorganization as performance issues arise throughout the lifecycle.

Entity-Relationship Model (ER)

The ER model is based on real-world entities and relationships. It is ideal for the conceptual design of a database. Core concepts include entity sets, relationship sets, attributes, and constraints. An ER diagram graphically represents the database structure and its connections.

ER Symbols and Notation

The ER model uses various symbols to denote entities, attributes, keys, relationships, and participation. Examples include: entities, attributes (simple, composite, derived, multi-valued, key), relationships (identifying and non-identifying), and participation types (total vs. partial).

Entity and Attributes

An entity is a real-world object distinguishable from others, possessing attributes that describe it. Examples include Banking System with entities such as Customer, Account, and Transaction; E-commerce with Product, Customer, and Order; and University Management with Student, Course, and Instructor. Attributes describe properties of entities (e.g., Customer: Name, CustomerID, Address, PhoneNumber; Account: AccountNumber, AccountType, Balance, Branch).

Attribute Types in ER Models

Attributes can be classified as:

  • Simple Attribute: cannot be divided (e.g., Gender, DOB).

  • Composite Attribute: can be divided into subparts (e.g., Address into Street, City, State, ZipCode; Name into FirstName, MiddleName, LastName).

  • Single-Valued Attribute: holds one value per entity instance (e.g., Customer ID).

  • Multi-Valued Attribute: can hold multiple values (e.g., Phone_Numbers).

  • Derived Attribute: values derived from other attributes (e.g., Age from DateofBirth; TotalSalary from BasicSalary and Allowances).

  • Key Attribute: uniquely identifies an entity instance (e.g., Employee_ID, ISBN).

  • Stored Attribute: stored in the database (e.g., DateofBirth).

  • Complex Attribute: a combination of multi-valued and composite attributes.

ER Diagram Basics and Relationships

ER diagrams help preview how tables should connect, describe entities, attributes, and relationships, and translate into relational tables for implementation. They communicate the logical structure to users and designers. Relationships describe how entities interact; for example, in a university system, a Student enrolls in Courses and a Professor delivers a Course.

Types of Keys in ER Models

A key uniquely identifies an entity or relationship instance. Key types include: Super Key, Candidate Key, Primary Key, Alternate Key, Foreign Key, Composite Key, Unique Key, Surrogate Key, and Secondary Key. A quick overview:

  • Super Key: a set of attributes that uniquely identifies a tuple; may include redundant attributes.

  • Candidate Key: a minimal super key; no redundant attributes.

  • Primary Key: a chosen candidate key used to uniquely identify records; cannot be NULL.

  • Alternate Key: candidate keys not chosen as the primary key.

  • Foreign Key: establishes relationships between tables by referencing a primary/unique key in another relation.

  • Composite Key: a key formed from two or more attributes.

  • Unique Key: ensures uniqueness but may allow a NULL value.

  • Surrogate Key: system-generated unique identifier.

  • Secondary Key: used for indexing or searching.

Candidate Key and Primary Key Examples

A candidate key is a minimal set of attributes that uniquely identifies a row. For example, in a Students table, possible candidate keys might include StudentID, SocialSecurityNumber, or a combination of FirstName and LastName (if unique). The primary key is one chosen candidate key, such as StudentID; others remain as alternate or candidate keys. A table can have multiple candidate keys, but only one is chosen as the primary key. The transcript provides explicit examples, e.g., in a Student table where SocialSecurityNumber and a combined FirstName+LastName could be candidate keys, with StudentID chosen as the primary key.

Problems and Solutions (Key Identification)

The transcript includes several problems about identifying super keys, candidate keys, primary keys, and alternate keys, with solutions. Example Problem 1 presents a Student table with StudentID, Name, Email, and Phone. Solutions identify:

  • Super Keys: {StudentID}, {Email}, {Phone}, {StudentID, Email}, {StudentID, Phone}, {Email, Phone}

  • Candidate Keys: {StudentID}, {Email}, {Phone}

  • Primary Key: {StudentID}

  • Alternate Keys: {Email}, {Phone}.

Problem 2 asks for a composite key to uniquely identify records in a student-course-grade table, with solution: the composite key is {StudentID, CourseID}.

Problem 3 includes a more complex schema with Registration and Course tables and asks for an ERModel; solution provides an ER mapping showing entities such as Student, Course, Registration with appropriate attributes and relationships.

ER to Table Conversion

ER to table conversion is illustrated: entities such as STUDENT, LECTURER, and SUBJECT map to relational tables with attributes like STUDENTID, LECTURERID, SUBJECTID, STUDENTNAME, LECTURERNAME, SUBJECTNAME, DOB, COURSEID, COURSENAME, etc. This demonstrates how ER diagrams are translated into relational schemas with foreign keys and join relations.

Advantages of the ER Model and Extended ER Model

The ER model offers advantages such as conceptual simplicity, clear visual representation, effective communication among stakeholders, strong integration with the relational model, and ease of conversion to other data models. The Extended ER (EER) model adds features like sub- and superclasses (inheritance), specialization/generalization, category/union types, and aggregation to model more complex real-world constraints.

Inheritance, Specialization, Generalization, and Aggregation

  • Superclasses and Subclasses: Inheritance between super and sub-classes is depicted in diagrams.

  • Specialization: Top-down refinement of an entity into subgroups based on characteristics.

  • Generalization: Bottom-up combination of sub-classes to form a superclass.

  • Aggregation: Treats a relationship as a higher-level entity when it involves multiple entities; used to model complex real-world scenarios such as a Manager who both manages Employees and Projects (the relationship between Employee and Project can be treated as a single entity that participates in another relationship).

  • The slides contrast Generalization and Specialization, noting that Generalization reduces schema size and is typically used for groups of entities, while Specialization increases schema size and can be applied to a single entity.

ER Diagram Practice Examples

Several practice scenarios are provided to illustrate ER modelling:

  • Online Shopping System: Entities include Customer (CustomerID, Name, Email, Phone, Address), Product (ProductID, ProductName, Price, Category), Vendor (VendorID, VendorName, ContactInfo); relationships include Orders between Customer and Product (Many-to-Many) with attributes OrderID, OrderDate, Quantity; Supplies between Vendor and Product (One-to-Many).

  • Hospital Management System: Patient, Doctor, and Treatment with relationships Receives (Patient-Treatment, Many-to-Many; attributes DateOfTreatment, Remarks) and Provides (Doctor-Treatment, One-to-Many).

  • Library Management System: Member, Book, Author with relationships Borrows (Member-Book, Many-to-Many) and Writes (Author-Book, One-to-Many).

  • University Management System: Student, Professor, Course, Department with Enrolls (Student-Course, Many-to-Many), Teaches (Professor-Course, One-to-Many), Belongs To (Professor-Department, Many-to-One).

  • Hotel Booking System: Customer, Room, Booking with Books (Customer-Room, Many-to-Many) and attributes BookingDate, Duration, AmountPaid.

  • Airline Reservation System: Passenger, Flight, Ticket with Books (Passenger-Flight, Many-to-Many) and Issued For (Flight-Ticket, One-to-Many).

Relational Model

The Relational Model, proposed by E.F. Codd, models data as relations (tables). After ER modeling, the conceptual model is transformed into the relational model compatible with RDBMS languages like Oracle SQL or MySQL. In the relational model, data are stored in relations (tables). There are several constraints and rules:

  • Domain Constraints: Attribute values must lie within the defined domain.

  • Key Integrity: Each relation should have at least one set of attributes that uniquely identifies a tuple (a key). Keys must be unique and not NULL.

  • Referential Integrity: Values in foreign-key columns must match primary/unique keys in the referenced relation. This maintains valid links between related tables.

Anomalies in Relational Design

Anomalies refer to irregularities that arise when designing databases without proper normalization. Insert, Update, and Delete anomalies can occur in relational designs that do not enforce proper constraints. For example, attempting to insert a row in a referencing relation when the referenced value does not exist; or attempting to delete a row in the referenced relation that is being referenced by rows in the referencing relation. Handling such anomalies often involves referential actions such as ON DELETE CASCADE and ON UPDATE CASCADE.

Codd's Twelve Rules

E.F. Codd proposed twelve rules (plus a foundational Rule 0) that a database must satisfy to be considered truly relational. The rules cover aspects such as the foundation of the relational model, information representation in cells, guaranteed data access, systematic treatment of NULL values, online catalogs (data dictionary), a comprehensive data sublanguage (SQL-like), view updating, relational operations at a high level, physical and logical data independence, distribution independence, and non-subversion of the database by external languages. The rules are typically enumerated as Rule 0 through Rule 12, with Rule 0 serving as the foundation and Rules 1–12 detailing specific relational properties and capabilities.

  • Rule 0 (Foundation Rule): The database must be in relational form so the system can operate using relational capabilities.

  • Rule 1 (Information Rule): All information in the database should be stored in tables (rows and columns).

  • Rule 2 (Guaranteed Access Rule): Every data item must be accessible logically via a combination of table name, primary key, and column name.

  • Rule 3 (Systematic Handling of Null Values): NULL values must be handled in a systematic way.

  • Rule 4 (Online Catalog): A database catalog (data dictionary) should be accessible online.

  • Rule 5 (Comprehensive Data Sub-Language): A comprehensive language (like SQL) must be available for data definition, data manipulation, view definition, and integrity constraints.

  • Rule 6 (View Updating): Views should be updatable.

  • Rule 7 (Relational Low-Level Operations): The system should support high-level relational operations (insert, update, delete) on the relational data.

  • Rule 8 (Physical Data Independence): Changes to storage structure should not affect application programs.

  • Rule 9 (Logical Data Independence): Changes to the logical schema should not affect application programs.

  • Rule 10 (Integrity Independence): Integrity constraints should be defined in the catalog and not rely on external programs.

  • Rule 11 (Distribution Independence): The database should operate regardless of its distribution across locations.

  • Rule 12 (Non-Subversion): All access should be via the relational language; external interfaces should not bypass integrity constraints.

Quick Reference and Practice

The unit includes quick references for ER modelling terminology, and practice problems with solutions that illustrate identifying keys, mapping ER diagrams to tables, and recognizing different types of relationships and cardinalities. It also includes ER diagram notations such as one-to-one, one-to-many, many-to-one, and many-to-many relationships along with their attributes and participating entities.

Notation and Cardinality in ER Diagrams

In ER modelling, cardinalities describe how many entities participate in a relationship. Common cases include one-to-one, one-to-many, many-to-one, and many-to-many. The degree of a relationship is the number of entity types that participate in it; for example, a binary relationship has a degree of 2, a ternary relationship has a degree of 3, and so on. This concept helps in determining how to implement the relationship in a relational schema and in the ERD mapping.

ER Diagram Practice and Conversion Examples (Summary)

  • The ER-to-table conversion demonstrates mapping each entity to a table with a primary key, mapping attributes to columns, and implementing relationships via foreign keys and potentially join tables for many-to-many relationships.

  • For example, in a Student-Department-Course scenario, Student, Course, and Department become separate tables with relationships implemented via foreign keys (e.g., Course.DepartmentID -> Department.DepartmentID).

  • Extended ER models add inheritance and aggregation constructs to more accurately reflect real-world constraints and relationships.

Summary of Key Concepts

  • DBMS concepts include data abstraction, schemas, instances, data models, and the three levels of data abstraction: physical, logical, and view.

  • Data modeling with ER and EER forms the basis for mapping to relational schemas.

  • The relational model stores data in tables with keys to enforce entity identity and relationships.

  • Codd's Rules define the criteria for true relational databases, emphasizing data independence, comprehensive query language support, and integrity constraints all within the catalog.

  • ER diagrams serve as blueprints for database design and can be converted into relational schemas for implementation.

  • The design process includes phases from requirements gathering to physical design and tuning, with conceptual, logical, and physical mapping steps.

  • Problems with file systems motivate the use of DBMSs, including issues like redundancy, data isolation, integrity, concurrency, and security.

  • The unit emphasizes practical skills in ER modelling, relational design, and understanding the architecture and components of modern DBMSs, including storage managers and query processors.

Equations and Notations Used in Notes

  • Information equation: INFORMATION=DATA+CONTEXT+MEANING\text{INFORMATION} = \text{DATA} + \text{CONTEXT} + \text{MEANING}

  • Relationship cardinalities are denoted with standard notations such as 1:N1:\, N representing one-to-many relationships and M:NM:\, N representing many-to-many relationships.

  • Data abstraction levels are described as Physical, Logical, and View levels, with the three levels constituting the data abstraction hierarchy.

  • The degrees of relationships are discussed as binary (2), ternary (3), etc., denoting the number of entity types participating in a relationship.

End of Unit 1 Overview

If you want, I can reorganize these notes into a concise study sheet or expand any specific section with more examples and diagrams (ER diagrams, table mappings, or SQL/PL/SQL examples).