Databases Flashcards

Relational Model & Introduction to SQL

Online Instructions

  • Mute microphone to avoid disruption.
  • Use the chat channel to ask questions, make comments, or raise your virtual hand.
  • Turn off video if internet is poor.
  • Wait for the lecturer to start.

In-Person Instructions

  • If you have any symptoms, even very mild, you MUST NOT stay in class. Go home, self-isolate, and seek medical advice immediately.

Acknowledgement of Country

  • UNSW Business School acknowledges the Bidjigal (Kensington campus) and Gadigal (City campus) as the traditional custodians of the lands where each campus is located.
  • Acknowledges all Aboriginal and Torres Strait Islander Elders, past and present, and their communities who have shared and practiced their teachings over thousands of years, including business practices.
  • Recognizes Aboriginal and Torres Strait Islander people’s ongoing leadership and contributions, including to business, education, and industry.

Database Design Process

  • Introduction and overview:
    • Data vs. Information
    • Data stored in databases
    • Database management system (DBMS)
    • Database design defines database structure
    • The next big thing: Big Data
  • Conceptual Model:
    • Entity-Relationship Modelling technique
    • Chen’s notation can be good for a high-level conceptual model initially
    • Crow’s Foot is the design standard in the textbook and this course
    • Entity type and entity instance
    • Attribute and value
    • Relationship (Degree, Connectivity, Cardinality)
  • Logical Model:
    • Converting Conceptual model to detailed Logical Model (using Crow’s Foot) ready for DB implementation
    • Advanced topics:
      • Relationship strength
      • Composite entity
      • Relationship degree
      • Supertype and Subtype
      • Selecting Primary Key
  • Relational Model:
    • Convert ER model to a set of tables (relations) in the relational model
    • Apply Normalisation on the relations to remove any anomalies
  • SQL Data Definition & Manipulation Language:
    • Use Relational Model to implement database by creating a table for each normalized relation
    • Data Definition Language defines the tables
    • Data Manipulation Language queries/updates the tables

Readings

  • Chapter 5 Entity Clustering 5-2
  • Chapter 3. The Relational Database Model 3-1 to 3-7
  • Chapter 7 Introduction to SQL 7-1 to 7-6
  • Chapter 8 Advanced SQL 8-2

Overview

  • Entity Clustering
  • Definition of the relational model and integrity
  • Mapping ER diagram (entities and relationships) to the relational model (schema/tables)
  • Entity clustering
  • Introduction to SQL
    • DDL (Data Definition Language): CREATE
    • DML (Data Manipulation Language): SELECT

Entity Clustering (if ERD is really big)

  • OFFERING is a cluster containing entities (and relationships among them)
    • SEMESTER
    • COURSE
    • CLASS
  • LOCATION is a cluster containing entities (and relationships among them)
    • ROOM
    • BUILDING

Relational Data Model

  • Logical ER model can be converted to a relational data model
    • Relation: two-dimensional table
    • Tuples: the rows of a relation
    • Attributes: the column headers of a relation
    • Keys: identifiers for a row of data
  • Schema for relation PRODUCT
    • PRODUCT (PROD_CODE, PROD_DESCRIPT, PROD_PRICE, PROD_ON_HAND, VEND_CODE)
  • Schema for relation VENDOR
    • VENDOR (VEND_CODE, VEND_CONTACT, VEND_AREACODE, VEND_PHONE)
  • Every relation has a unique name.
  • Attributes in tables have unique names.
  • Every attribute value is atomic, e.g., two attributes: first name, and last name
  • Every row is unique.
  • The order of the columns is irrelevant.
  • The order of the rows is irrelevant.

Database Schema

  • Relational model form database schema, which can be converted to tables
  • The name of a relation (table) and its set of attributes (column headers) are called a schema for the relation.
  • The set of schemas for all relations (tables) in a design is called a database schema.
  • Schema for relation PRODUCT
    • PRODUCT (Prod_Code, Prod_Descript, Prod_Price, Prod_On_Hand, Vend_Code)
  • Schema for relation VENDOR
    • VENDOR (Vend_Code, Vend_Contact, Vend_AreaCode, Vend_Phone)
  • Schema for relation PRODUCT
  • Schema for relation VENDOR
  • Data Dictionary (Metadata) describes schema details (e.g., table purpose, Column data types) and Other Database Objects, e.g., constraints on PK, FK, indexes (to speed up queries), default values for columns etc.

Data Dictionary

  • Data Dictionary describes schema (database) details and other objects
  • Example:
    • TABLE NAME: CUSTOMER
      • ATTRIBUTE NAME: CUS_CODE
        • CONTENTS: Customer account code
        • TYPE: CHAR55
        • FORMAT: 99999
        • RANGE: 10000-99999
        • REQUIRED: Y
        • PK OR FK: PK
      • ATTRIBUTE NAME: CUS_LNAME
        • CONTENTS: Customer last name
        • TYPE: VARCHAR2020
        • FORMAT: XXXXXXXX
        • REQUIRED: Y
        • PK OR FK:
      • ATTRIBUTE NAME: AGENT_CODE
        • CONTENTS: Agent code
        • TYPE: CHAR33
        • FORMAT: 999
        • REQUIRED:
        • PK OR FK: FK
    • TABLE NAME: AGENT
      • ATTRIBUTE NAME: AGENT_CODE
        • CONTENTS: Agent code
        • TYPE: CHAR33
        • FORMAT: 999
        • REQUIRED: Y
        • PK OR FK: PK
      • ATTRIBUTE NAME: AGENTYTDSLS
        • CONTENTS: Agent year-to-date sales
        • TYPE: NUMBER9,29,2
        • FORMAT: 9,999,999.99
        • REQUIRED: Y
        • PK OR FK:
  • PK = Primary key
  • FK = Foreign key
  • CHAR = Fixed character length data (1 - 255 characters)
  • VARCHAR = Variable character length data (1 - 2,000 characters)
  • NUMBER = Numeric data. NUMBER (9,2)(9,2) is used to specify numbers with up to nine digits, including two digits to the right of the decimal place. Some RDBMS permit the use of a MONEY or CURRENCY data type.

ANSI/ISO SQL Data Types (not all are in Oracle)

  • CHAR: Fixed-length character strings
  • VARCHAR: Variable-length character strings (VARCHAR and VARCHAR2 both work for Oracle)
  • INTEGER: Integer numbers
  • BIT: Fixed-length bit string (bit array)
  • NUMERIC: Decimal numbers (NUMERIC and NUMBER both work for Oracle)
  • FLOAT: Floating point numbers
  • DATE: Calendar date
  • TIME: Clock time
  • TIMESTAMP: Date and time

Keys

  • Primary Key: The primary key of any table is any candidate key of that table which the database designer arbitrarily designates as “primary”
  • Candidate Key: Any set of one or more columns whose combined values are unique among all occurrences (i.e., tuples or rows).
  • Alternate (Secondary Key): The alternate keys of any table are those candidate keys which are not currently selected as the primary key
  • Foreign Key: A set of one or more columns in any table which may hold the values found in the primary key column of another table

Database integrity constraints

  • Entity Integrity
    • Requirement: All primary key entries are unique, and no part of a primary key may be NULL.
    • Purpose: Ensures each row has a unique identity, allowing foreign keys to reference primary keys correctly.
    • Example: No invoice can have a duplicate or NULL number; all invoices are uniquely identified by their invoice number.
  • Referential Integrity
    • Requirement: A foreign key may either be NULL or match an existing primary key value in the referenced table.
    • Purpose: Prevents invalid foreign key values and ensures consistency between related tables.
    • Example: A customer’s sales agent number must be valid in AGENT table.
  • Domain Integrity
    • Requirement: Data values must conform to the column’s predefined data type and constraints.
    • Purpose: Ensures data accuracy by restricting entries to a valid format.
    • Example: A date column can only accept the dates in the past 10 years (not a future date, not a very old and irrelevant date)

Example: entity integrity and referential integrity

  • Entity Integrity
    • The CUSTOMER table's primary key is CUS_CODE.
    • The AGENT table's primary key is AGENT_CODE.
    • Both primary key columns have no NULL values and contain unique entries.
  • Referential Integrity
    • The CUSTOMER table has a foreign key, AGENT_CODE, linking it to the AGENT table.
    • The row with CUSCODE = 10013 has a NULL value in AGENTCODE because Paul F. Olowski has no assigned sales representative.
    • All other AGENTCODE values in the CUSTOMER table correctly match existing AGENTCODE values in the AGENT table.

Logical Model to Relational Model

  • Make sure the logical model had taken care of composite attributes and multi-value attributes
  • Translate each entity (including original entity, child entity, and composite entity) into a relation (table)
  • The attributes of the entity become the attributes of the relation
  • Translate relationships to FK links
  • Preferable to use only these component attributes as columns
  • Composite Attributes
    • Customer_Address
      • Street
      • City
      • State
      • Post Code
    • CUSTOMER
      • Customer_ID
      • Customer_Name
      • Street
      • City
      • State
      • Post Code
  • PK of PAINTING is only PAINTINGNUM (not derived from parent entity’s PK, PAINTERNUM)
  • Is PAINTING a strong entity? Yes
  • Is the relationship between PAINTING and PAINTER a strong identifying one? No

Mapping 1:1 Unary/Recursive relationship to relational model

  • Composite entity becomes a relation (table)
  • PERSON (ID, BIRTHDATE, NAME,...)
  • Is Married to PERSON
  • MARRIAGE (ID, SPOUSE_ID)

Mapping 1:M Unary/Recursive relationships to relational model

  • EMPLOYEE (EMP_ID, EMP_NAME, DATE_OF_BIRTH, DEPT_ID, MANAGER_ID)
  • MANAGERID is EMPID too because a manager is also an employee
  • One employee (manager) can manage multiple employees (subordinates)
  • One employee (subordinate) is managed by at most one manager
  • An employee is optional in the relationship as a subordinate, i.e., does not need to have a manager (e.g., CEO may not have a manager)
  • An employee is optional in the relationship as a manager, i.e., An employee may not manage anyone
  • A recursive FK is a foreign key in a relation (table) that references the primary key values of the same relation
  • In the example, for some employees, the ManagerID could be “NULL” because they do not have a manager.

Mapping M:N Unary/Recursive relationships to relational model

  • COURSE (COURSE_ID, COURSE_NAME, ...)
  • PREREQUISITE (COURSE_ID, PREREQ_COURSE_ID, ...)
  • PREREQUISITE table serves as a composite entity to break down the many-to-many relationship between courses and prerequisites
  • A PREREQUISITE instance can be linked to exactly one course as the base course (COURSE_ID in PREREQUISITE must be valid inside COURSE)
  • A PREREQUISITE instance can be linked to exactly one course as the pre-req (PREREQCOURSEID must be valid inside COURSE)
  • A base course can be linked to multiple PREREQUISITE instances
  • A pre-req course can be linked to multiple PREREQUISITE instances

Supertype and Subtypes

  • Shall we create only one big relation for the supertype and all subtypes?
    • EMPLOYEE (EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INIITIAL, EMP_HIRE_DATE, PIL_LICENSE, PIL_RATINGS, PIL_MED_TYPE, MEC_TITLE, MEC_CERT, ACT_TITLE, ACT_CPA_DATE)
    • Problems?
      • Hard to ensure data integrity - only the correct attributes shall be filled for a PILOT
      • Waste of storage space - Many null values in attributes PILLICENSE, MECTITLE, ACT_TITLE
      • Inefficient to retrieve employees of a specific type, e.g., to retrieve just pilots, we need to filter out most rows that are not PILOT
      • EMP_TYPE is not needed in this case - all normal employees, pilots, mechanics, and accountants merge into one table.
      • Easiest to implement
  • Shall we create a separate relation for each subtype and ignore the supertype?
    • Issues?
      • Common attributes, e.g., EMPLNAME, EMPFNAME, are scattered across 3 tables
      • To find one employee’s generic details (e.g., EMPHIREDATE), we must search in 3 tables
      • Enforcing disjoint constraint becomes tricky, e.g., how to prevent EMP_NUM 1 from existing in multiple subtype tables
      • PILOT (EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INIITIAL, EMP_HIRE_DATE, PIL_LICENSE, PIL_RATINGS, PIL_MED_TYPE)
      • ACCOUNTANT (EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INIITIAL, EMP_HIRE_DATE, ACT_TITLE, ACT_CPA_DATE)
  • We shall create separate relations for each subtype and the supertype
    • Generic details of all employees are stored once in the EMPLOYEE table
    • Subtype-specific attributes are stored in the subtype table, saving data storage space
    • Updating an employee’s specific information would modify only 1 of the 4 tables
    • Easier to ensure integrity that an employee belongs to only one subtype
    • More efficient to retrieve an employee – no need to combine multiple tables
      • EMPLOYEE (EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INIITIAL, EMP_HIRE_DATE, EMP_TYPE)
      • ACCOUNTANT (A_EMP_NUM, ACT_TITLE, ACT_CPA_DATE, EMP_NUM)
  • MECHANIC (M_EMP_NUM, MEC_TITLE, MEC_CERT, EMP_NUM)
  • PILOT (P_EMP_NUM, PIL_LICENSE, PIL_RATINGS, PIL_MED_TYPE, EMP_NUM)

Header-Details pattern: Mapping an M:N Relationship in e-commerce

  • How is this useful?
    • Avoids repeated order-level data, by separating that from item-level data, i.e., order-level details (e.g., OrderDate) are stored once in the ORDERHEADER, while individual items are stored in ORDER_DETAILS
    • Data integrity - every item in ORDERDETAILS must belong to a valid order in ORDERHEADER
    • Easier to find all items within an order from table ORDER_DETAILS
    • ORDER_HEADER
      • PK Order_ID
      • Order_Date
    • ORDER_DETAILS
      • PK, FK1 Order_ID
      • PK Product_ID
      • Unit_Price
      • Quantity
        *Note: The relationship between entities is not shown in this.
    • ORDER Purchase PRODUCT

Header-Details pattern is also used in invoicing

  • New column LINENUMBER as part of PK (instead of PRODCODE)
  • “redundant” column LINEPRICE is duplicate as PRODPRICE – why?
  • CUSTOMER
    • PK CUS_CODE
    • CUS_LNAME
      • INVOICE
        • PK INV_NUMBER
        • FK CUS_CODE
        • INV_DATE
      • LINE
        • PK INV_NUMBER
        • PK PROD_CODE
        • PK LINE_NUMBER
        • FK PROD_CODE
        • LINE_UNITS
        • LINE_PRICE
      • PRODUCT
        • PROD_DESCRIPT
          • PROD_PRICE
            • PRODONHAND
              • VEND_CODE

Recap

  • ER entities become relational schemata (relations).
  • Entity relationships become relational schema or references with FKs in the tables.
  • ER attributes of an entity become column headers.
  • Instances are the rows (a.k.a. tuples) in the actual tables.
  • Connectivity is indirectly expressed through:
    • Primary Key (PK) uniqueness
    • Foreign Key (FK) links
    • NOT NULL Constraint enforcing certain FK columns must have valid values, hence imposing mandatory participation.
  • Cardinality is indirectly expressed through:
    • number of rows in table X that table Y’s one particular row can be linked to
    • this uses rather technical constraints, hence optional when we use SQL to implement the ERD

Using Oracle SQL Developer

  • Two options:
    • download SQL application and install on your laptop, by following Oracle SQL Developer Setup guide
    • access SQL Developer online
  • You will receive an email with your Oracle SQL login details.
  • The Oracle Lab Manual will cover SQL syntax, explaining each statement step by step. The content is very complete and detailed and SERVES ONLY AS A REFERENCE.
  • The SQL codes used in the manual are in the SQL Demo scripts, which also SERVES ONLY AS A REFERENCE.

Structured Query Language (SQL)

  • Easy to learn
    • nonprocedural language with a basic command vocabulary set of fewer than 100 words
  • First standard database language, originally developed by D. Chamberlin and R. Boyce at IBM
  • Most common SQL standard today: ANSI/ISO SQL (Originally defined in 1988, SQL-86, but gone through major revisions)
  • Microsoft, Oracle, and other vendors have introduced deviations from ANSI SQL; but largely very similar
  • SQL functions fall under broad categories:
    • Data Definition Language (DDL)
      • Defines and modifies database structures (e.g., CREATE, ALTER, DROP)
  • Data Manipulation Language (DML)
    • Used to retrieve, insert, update, and delete data (e.g., SELECT, INSERT, UPDATE, DELETE)
  • Transaction Control Language (TCL)
    • Manages transaction consistency within the database (e.g., COMMIT, ROLLBACK, SAVEPOINT)
  • Data Control Language (DCL)
    • Controls user access to the database, including granting and revoking permissions (GRANT, REVOKE). (Not covered in this course.)
  • A transaction is a logical unit of work that consists of one or more SQL statements
  • All included operations must either fully complete or fully rollback in case of failure

SQL keywords

  • SQL contain user-defined words and keywords, including command keywords
    • Category: DDL (Data Definition Language)
      • Command: CREATE
        • Keyword: Explanation Creates new tables, indexes, or database structures
  • Category: DML (Data Manipulation Language)
    • Command: INSERT
      • Keyword: Explanation Adds new rows of data to a table
      • Keyword: Explanation Removes existing rows from a table
  • Complete SQL statements consists of:
    • Reserved keywords: fixed part of the language – pay special attention to the command keywords below
    • User-defined words: the meaning of the data to the user (e.g., “users”, “bookings”)

CREATE

  • Let’s create database structure
  • CREATE SCHEMA AUTHORIZATION Creator
  • CREATE TABLE tablename ( column1 data type [constraint] [, column2 data type [constraint] ] [, PRIMARY KEY (column1 [, column2]) ] [, FOREIGN KEY (column1 [, column2]) REFERENCES tablename] [, CONSTRAINT constraint ] );
  • DATA DEFINITION LANGUAGE
  • CREATE SCHEMA AUTHORIZATION zXXXXX; Creator must be the same as your userid, Oracle schema zXX has been created for you, in which tables can be created
  • CREATE TABLE CLASS ( CLASS_ID INTEGER NOT NULL, PROF_ID INTEGER NOT NULL, COURSE_ID INTEGER NOT NULL, SEMESTER VARCHAR2(10) NOT NULL, SECTION VARCHAR2(10) NOT NULL, PRIMARY KEY (CLASS_ID), FOREIGN KEY (PROF_ID) REFERENCES PROFESSOR(PROF_ID), FOREIGN KEY (COURSE_ID) REFERENCES COURSE(COURSE_ID), CONSTRAINT unique_crs_sem_sec UNIQUE (COURSE_ID, SEMESTER, SECTION));
  • Mandatory information for each class
  • Foreign key referencing the professor
  • Foreign key referencing the course

CREATE: More details on constraints

  • PRIMARY KEY: can be in the column definition, or at the end of table definition as a constraint
  • FOREIGN KEY: cannot delete a VENDORID from the VENDOR table if at least one product record references that VENDORID
  • NOT NULL: column cannot be empty, i.e., mandatory column
  • UNIQUE: all values in a column or a combination of columns must be unique
  • DEFAULT: a default value for a column when a new row is created
  • CHECK: validate data when a value is entered for a column
  • DATA DEFINITION LANGUAGE
    • CREATE TABLE VENDOR ( VENDOR_ID VARCHAR2(10), VENDOR_NAME VARCHAR2(100) NOT NULL, CONTACT_EMAIL VARCHAR2(100) UNIQUE, COUNTRY VARCHAR2(50) DEFAULT 'AUSTRALIA', PRIMARY KEY ( VENDOR_ID ) );
    • CREATE TABLE PRODUCT ( PRODUCT_ID VARCHAR2(10) PRIMARY KEY, PRODUCT_NAME VARCHAR2(100) NOT NULL, PRICE NUMBER(10, 2) NOT NULL CONSTRAINT PRICE_CHECK CHECK ( PRICE > 0 ), STOCK NUMBER DEFAULT 0 CONSTRAINT STOCK_CHECK CHECK ( STOCK >= 0 ), VENDOR_ID VARCHAR2(10) NOT NULL, CONSTRAINT PRODUCT_VENDOR_FK FOREIGN KEY ( VENDOR_ID ) REFERENCES VENDOR ( VENDOR_ID ) );

SELECT: retrieves and displays data from one or more tables

  • SELECT [DISTINCT | ALL] column_expression [AS new_name] [, ...] FROM table_name [alias] [, ...] [WHERE condition] [GROUP BY column_list] [HAVING condition] [ORDER BY column_list];
    • [ ] indicates optional elements. | indicates “or.” ; indicates the end of the statement.
  • DATA MANIPULATION LANGUAGE
    • SELECT – specifies the columns to be returned by the query
    • FROM – specifies the table(s) from which the data will be retrieved
    • WHERE – filters the rows of data based on provided criteria

SELECT: show all rows and columns

  • DATA MANIPULATION LANGUAGE
  • SELECT ALL FROM PRODUCT;
  • OR
  • SELECT * FROM PRODUCT; -- wild card

SELECT: pick a few columns

  • SELECT P_CODE, P_DESCRIPT FROM PRODUCT;
    • List all Product codes and product descriptions
  • SELECT V_CODE FROM PRODUCT;
  • SELECT DISTINCT V_CODE FROM PRODUCT;
    • Keyword DISTINCT eliminates duplicates
  • DATA MANIPULATION LANGUAGE

SELECT: column alias and computed columns

  • `SELECT PCODE, PDESCRIPT AS DESCRIPTION, P_PRICE AS