Structured Query Language and Relational Database Concepts — Comprehensive Study Notes

Basic Concepts: Databases and SQL

  • A database is an organized collection of data stored electronically. It allows efficient storage, access, and management of large amounts of information, often structured in tables with rows and columns, enabling users to find and retrieve data based on defined relationships among data points.

  • SQL (Structured Query Language) is the standard language for accessing databases. It is an ANSI (American National Standards Institute) standard and lets you access and manipulate databases.

  • SQL-based relational database applications have three parts:

    • End-user interface: Allows the end user to interact with the data.

    • Set of tables stored in the database: Each table is independent; rows in different tables relate via common attribute values.

    • SQL engine: Executes queries.

The Relational Model (RF) and Core Concepts

  • Developed by E.F. Codd (IBM) in 1970.

  • Core idea: data stored in tables (relations); a table is a matrix of rows and columns; each row is a tuple; each column is an attribute.

  • Example: A Student table with attributes like ID, NAME, CLASS, MARK, etc. The relation stores related entities; a row represents a specific entity instance.

  • Relational data management system (RDBMS) hides underlying complexity and provides a relational diagram that maps entities, attributes, and relationships to relational tables.

Entities and Attributes

  • Entity: a person, object, or abstract concept that exists and can be distinguished.

  • Attribute: a property that describes an entity.

  • Example domains and attribute sets (illustrative lists from the transcript):

    • Entity types include PERSON, TOY, ROOM, COUNTRY, ENGINE, TITLE, PRICE, ADDRESS, DOB, GENDER, etc.

    • Attributes describe instances of an entity (e.g., for a PERSON: NAME, DOB, ADDRESS).

  • Notation convention (per the notes):

    • The entity name written in capital letters and singular.

    • Attributes listed in parentheses after the entity name.

    • Examples:

    • PUPIL (candidatenumber, forename, surname, dob, tutorgroup)

    • CAR (registration_plate, make, model, colour)

    • DOCTOR APPOINTMENT (date, time, consultant, location)

External, Conceptual, Internal Schemas; Data Abstraction Levels

  • External Model: end users’ views of the data environment; ER diagrams represent external views; external schema defines how data is represented and constrained for a particular user or application.

  • Conceptual Model (global view): all external views integrated into a single global view (the conceptual schema). The ER model is widely used here; ERD graphs the conceptual schema.

  • Internal Model: how the DBMS stores data (implementation details); internal schema shows a specific representation of the internal model and maps the conceptual model to the DBMS.

  • Data abstraction levels (from high to low): External Model → Conceptual Model → Internal Model → Physical Model (actual storage on disks/tapes).

  • The term “schema” refers to a blueprint or design; the external, conceptual, and internal schemas reflect different levels of abstraction and independence (e.g., logical independence, physical independence).

Entity-Relationship (ER) Model and Notations

  • The ER model provides a graphical representation of entities, their attributes, and relationships.

  • Chen notation (diamond for relationships):

    • Relationships are diamonds; the relationship name is inside the diamond.

  • Crow’s Foot notation is used as a design standard in the book for representing cardinalities:

    • 1:M: one entity relates to many of another (e.g., one painter can paint many paintings; a painting is painted by one painter).

    • M:N: many-to-many relationships (e.g., employees can learn many skills; each skill can be learned by many employees).

    • 1:1: one-to-one relationship (e.g., one employee manages one store; each store is managed by one employee).

  • Diagram mappings: an ER model entity maps to a relational table; relationships map to either foreign keys or bridging (intersection) entities for M:N relationships.

  • Example notations (from the transcript):

    • 1:M: PAINTER 1 .. M PAINTING (paints)

    • M:N: EMPLOYEE M .. N SKILL (learns)

    • 1:1: EMPLOYEE 1 .. 1 STORE (manages)

Relational Model vs ER Model: Key Differences

  • ER Model (conceptual design) focuses on what data is and how it relates; it is a blueprint for design.

  • Relational Model (logical/implementation) focuses on how data is stored and accessed (tables with rows and columns).

  • ER model is typically used in the database design phase; relational model is used in the implementation phase.

  • Example comparison (from the notes):

    • ER Model: Entity STUDENT, Relationship Enrolls in, connected to COURSE; attributes include NAME, MAJOR, etc.

    • Relational Model: Tables such as STUDENT(student_id, name, major) representing the same information in a table form.

Schema Levels: External, Conceptual, Internal (Illustrative Tiny College Example)

  • External schemas describe end-user views (e.g., Student Registration, Class Scheduling).

  • Conceptual schema (global): integrates external views; ER model often used to represent it.

  • Internal schema: maps conceptual model to the DBMS; examples show SQL DDL for creating real tables (CREATE TABLE …).

  • Example mappings (high-level):

    • ENTITY STUDENT maps to a relational table STUDENT with fields such as student_id, name, major, etc.

    • RELATIONSHIP ENROLL maps to an ENROLL table with foreign keys to STUDENT and CLASS.

Tables, Keys, and Integrity Rules

  • A table in a relational database is a two-dimensional structure: rows (tuples) and columns (attributes).

  • Each row represents a single entity occurrence; each column holds values of a single data type across all rows.

  • Data types common in the notes include numeric, character, date/time, and boolean. Examples: extNUMBER(9,2)ext{NUMBER}(9,2), extCHAR(8)ext{CHAR}(8), extVARCHAR(20)ext{VARCHAR}(20), extDATEext{DATE}, extBOOLEANext{BOOLEAN}.

  • Primary Key: a field (or combination) that uniquely identifies each row in a table. It cannot be NULL.

  • Candidate Keys: attributes or combinations of attributes that could serve as a primary key. A table may have multiple candidate keys.

  • Superkey: any set of attributes that uniquely identifies a row; a candidate key is a minimal superkey.

  • Composite Key: a primary key composed of more than one attribute.

  • Nulls: represent missing or unknown values; primary keys cannot be null; nulls can cause issues with COUNT, AVERAGE, SUM, and join integrity. Nulls can complicate referential integrity.

  • Entity Integrity: primary keys must be unique and non-null for every row.

  • Referential Integrity: foreign key values must reference existing primary key values in the related table; nulls in a foreign key are allowed if not part of the primary key.

  • Integrity examples from the notes include VENDOR, CUSTOMER, AGENT datasets, with examples of primary keys (e.g., VENDCODE, CUSCODE, AGENTCODE), foreign keys (e.g., AGENTCODE in CUSTOMER), and constraints like range checks and data types.

  • Data Dictionary / System Catalog: contains metadata about tables, columns, data types, constraints, and relationships; acts as the database designer’s database.

Relational Operators (Relational Algebra)

  • Relational algebra defines a theoretical way to manipulate table contents via operators. These operators can be combined (closure) to yield new relations.

  • Core operators:

    • Selection: σcondition(R)\sigma_{condition}(R) (restrict rows by a condition)

    • Projection: πattributes(R)\pi_{attributes}(R) (select specific columns)

    • Union: RSR \cup S (combine rows from two tables with the same attributes, removing duplicates)

    • Intersection: RSR \cap S (rows common to both tables)

    • Difference: RSR \setminus S (rows in R but not in S)

    • Cartesian Product (Product): R×SR \times S (all possible pairs of rows from the two tables)

    • Join: RSR \Join S (combine related columns from two or more tables based on a common attribute)

  • Examples illustrate how these operators produce new relations and support complex queries.

Joins and Join Types

  • Natural Join: joins two tables by automatically matching columns with the same name and dropping duplicates; matches on common attributes.

  • Inner Join (explicit/canonical form): returns rows with matching keys in both tables.

  • Left Outer Join: returns all rows from the left table, with matching values from the right table or NULLs where there is no match.

  • Right Outer Join: returns all rows from the right table, with matching values from the left table or NULLs where there is no match.

  • Full Outer Join: combination of left and right outer joins (retains unmatched rows from both sides).

  • Example SQL (illustrative):

    • INNER JOIN: SELECT users.name, likes.like FROM users INNER JOIN likes ON users.id = likes.user_id;

    • LEFT JOIN: SELECT users.name, likes.like FROM users LEFT JOIN likes ON users.id = likes.user_id;

    • RIGHT JOIN: SELECT users.name, likes.like FROM users RIGHT JOIN likes ON users.id = likes.user_id;

    • FULL OUTER JOIN: (various syntaxes; depends on DBMS) combines left and right outer joins.

  • Natural Join and outer joins can be combined with UNION in some cases to yield comprehensive results.

Data Dictionary, Catalogs, and Data Integrity

  • Data Dictionary: details all tables, attribute names, data types, constraints, and metadata; often considered the designer’s database.

  • System Catalog: system-defined metadata about the database catalog (tables, columns, constraints, etc.).

  • Integrity Rules emphasize safe and consistent data: entity integrity, referential integrity, and the use of dummy flag values to avoid NULLs in certain attributes when necessary.

Relationships in the Relational Model

  • 1:M (one-to-many): the relational modeling norm; common and efficiently implemented via foreign keys in the “many” side.

  • 1:1 (one-to-one): rare; may indicate that two entities should be in the same table or that one component is tightly coupled to another.

  • M:N (many-to-many): cannot be implemented directly as a single relation in the relational model; typically decomposed into two 1:M relationships via a bridge (composite) table that includes the primary keys of both related tables as foreign keys.

  • Bridge example (from the notes): STUDENT and CLASS have a many-to-many relationship; implemented via an ENROLL bridge table with primary keys (CLASSCODE, STUNUM) and a grade attribute ENROLL_GRADE.

  • Bridge table structure example (illustrative):

    • STUNUM, CLASSCODE, ENROLLGRADE, with STUNUM referencing STUDENT and CLASS_CODE referencing CLASS.

  • 1:M example: PAINTER and PAINTING; PAINTING contains PAINTER_NUM as a foreign key to PAINTER.

  • 1:1 example: PROFESSOR and DEPARTMENT, where DEPARTMENT contains EMP_NUM as a foreign key (chairs relationship).

Converting M:N Relationships to 1:M Relationships

  • To implement an M:N relationship in the relational model, replace with two 1:M relationships using a bridging (intersection) table.

  • Example: STUDENT to CLASS via ENROLL (bridge) with attributes: {CLASSCODE, STUNUM, ENROLL_GRADE} and foreign keys to CLASS and STUDENT.

  • The bridging table captures the many-to-many associations and can hold descriptive attributes (e.g., ENROLL_GRADE).

Tiny College: Conceptual, Internal, and Relational Mappings (Illustrative)

  • Conceptual model shows entities such as STUDENT, PROFESSOR, COURSE, CLASS, ENROLL, ROOM, etc., and their relationships (enrolls, teaches, is used for, generates, etc.).

  • Internal model demonstrates how these map to actual SQL CREATE TABLE statements and foreign keys (e.g., PROFESSOR(profid, proflname, proffname, profinitial), CLASS(classid, crsid, profid, roomid), ROOM(roomid, roomtype), COURSE(crsid, crsname, crs_credits)).

  • A typical relational diagram for the Tiny College database shows tables: STUDENT, PROFESSOR, COURSE, CLASS, ROOM, ENROLL, DEPT, with foreign keys linking related tables (e.g., ENROLL.CLASSCODE → CLASS.CLASSCODE; CLASS.CRSCODE → COURSE.CRSCODE; PROFESSOR.DEPTCODE → DEPARTMENT.DEPTCODE).

Practical Keys and Domain Constraints (3.1 and 3.2)

  • Each data field has:

    • A distinct name (e.g., STUNUM, STULNAME)

    • Table attributes/fields

    • A data type (Numeric, Character/String, Date/Time, Logical)

    • A domain of possible values (e.g., GPA ∈ [0.0, 4.0])

    • A table must have a PRIMARY KEY (unique identifier for the table)

  • Example of a STUDENT table schema (illustrative):

    • STUDENT(STUNUM, STULNAME, STUFNAME, STUINIT, STUDOB, STUHRS, STUCLASS, STUGPA, STUTRANSFER, DEPTCODE, STUPHONE, PROFNUM)

    • Primary key: STU_NUM

    • Data types: STUNUM (CHAR or NUMBER), STUGPA (NUMBER(3,2) or DECIMAL(3,2)), STUDOB (DATE), STUPHONE (VARCHAR), STU_CLASS (CHAR), etc.

  • Example data constraints (from the notes):

    • GPA ∈ [0.0, 4.0]

    • Domain-based values (e.g., DEPT_CODE in a fixed set)

  • Null handling: NULLs indicate unknown or missing values; avoid nulls in primary keys and consider using dummy flag values to represent absence in some attributes.

  • The concept of a primary key, candidate keys, superkeys, and composite keys is essential for identifying unique rows and establishing proper relationships across tables.

Integrity Rules in Practice

  • Entity Integrity: every row must have a unique, non-null primary key value. Example: a table Invoice must not have two rows with the same invoice number, and the invoice number cannot be NULL.

  • Referential Integrity: foreign keys must reference existing primary key values in their related tables; a foreign key can be NULL only if the business rule allows it.

  • Example scenarios (from the notes):

    • A customer may not yet have an assigned sales representative (AGENTCODE may be NULL) but if AGENTCODE is non-null, it must reference an existing AGENT.

  • The data dictionary/catalog helps enforce and document these rules by listing table names, attribute names, data types, key constraints, and references.

Examples: CUSTOMER and AGENT (Illustrative Integrity Figures)

  • Customer table example fields: CUSCODE (PK), CUSLNAME, CUSFNAME, CUSINITIAL, CUSAREA CODE, CUSPHONE, CUSINSURETYPE, CUSINSUREAMT, CUSRENEWDATE, AGENT_CODE (FK to AGENT).

  • AGENT table example fields: AGENTCODE (PK), AGENTAREACODE, AGENTPHONE, AGENTLNAME, AGENTYTD_SLS.

  • Integrity rules in this example:

    • AGENTCODE in CUSTOMER references AGENT.AGENTCODE (FK).

    • If a customer has AGENTCODE NULL, that is allowed only if the business rule permits unassigned agents; otherwise it must reference a valid AGENTCODE.

    • Primary keys are unique and non-null (PKs). Foreign keys reference existing PK values in related tables.

External and Natural Joins: Practical SQL Examples

  • Natural Join merges two tables on all columns with the same name, implicitly joining on those common attributes and removing duplicate columns in the result.

  • Left Outer Join returns all rows from the left table and matched rows from the right table; unmatched right-side rows appear as NULLs.

  • Right Outer Join returns all rows from the right table and matched rows from the left table; unmatched left-side rows appear as NULLs.

  • Example (natural join):

    • Given CUSTOMER and AGENT, a natural join on AGENT_CODE would combine customer data with their assigned agent information.

  • Example of a more explicit join: joining CUSTOMER with AGENT on AGENT_CODE to produce a combined view with customer details and their agent’s contact information.

The Tiny College: Natural Language to Relational Mapping (Summary References)

  • Tiny College example demonstrates a full mapping from ER diagrams to relational schemas, including:

    • Entities: STUDENT, PROFESSOR, COURSE, CLASS, ROOM, ENROLL, DEPARTMENT

    • Relationships: ENROLL (Student ↔ Class), TEACHES (Professor ↔ Class), ISUSEDFOR (Class ↔ Room), and more.

    • The mapping yields a relational diagram where each entity becomes a table and each relationship becomes a set of foreign keys or a bridging table for M:N relationships.

  • The consolidated relational diagram (3.29) shows how STUDENT, CLASS, ENROLL, COURSE, DEPT, PROFESSOR, ROOM, and other tables relate via keys such as STUNUM, CLASSCODE, CRSCODE, DEPTCODE, PROF_NUM, etc.

Summary: Key Takeaways for Exam Preparation

  • The Relational Model is built on tables (relations) with rows (tuples) and columns (attributes); each row is uniquely identifiable by a primary key.

  • ER modeling (Chen notation) is used for conceptual design, while the Relational Model is used for implementation; mapping ER models to relational schemas is a core skill.

  • Keys:

    • Primary Key: unique, non-null identifier for each row.

    • Candidate Keys: potential primary keys.

    • Superkey: any key that uniquely identifies a row.

    • Composite Key: a primary key composed of multiple attributes.

    • Foreign Key: a key that references a primary key in another table; used to enforce referential integrity.

  • Integrity Rules:

    • Entity Integrity: primary keys are unique and non-null.

    • Referential Integrity: foreign keys reference existing primary keys; NULLs may be allowed in some cases depending on rules.

  • Null handling is important; avoid unnecessary NULLs; consider dummy values to simplify integrity constraints where appropriate.

  • Relational Set Operators (Relational Algebra): Select, Project, Union, Intersect, Difference, Product, Join; these operators form the basis for complex queries.

  • Joins (and variations): Inner, Left Outer, Right Outer, Full Outer, Natural Join; important for combining data across tables with relational links.

  • Data Dictionary and System Catalogs store metadata about tables, attributes, data types, constraints, and relationships; critical for maintaining data governance and design decisions.

  • In 1:M relationships, implement via a foreign key on the “many” side; in M:N relationships, use a bridge table to realize two 1:M relationships.

  • Practice tasks (as in the notes): identify primary keys and foreign keys, assess entity and referential integrity, describe relationship types between named entities (e.g., STORE and REGION), and construct relational diagrams.

*** End of Notes ***