Note

0.0(0)

Take a practice test

Chat with Kai

undefined Flashcards

0 Cards0.0(0)

Explore Top Notes

Lecture 1- Photosynthesis

Note

Studied by 3 people

5.0(1)

Chapter 7: Interpersonal Writing: E-mail Reply

Note

Studied by 85 people

5.0(2)

AP Psych Unit 0 Vocab

Note

Studied by 12 people

5.0(1)

GEOL 101: Final Exam Review

Note

Studied by 222 people

5.0(2)

Chapter 12: Quantitative Skills and Biostatistics

Databases Lecture Notes Flashcards - W7

Normalization Part 2 & SQL Wrap-Up

Introduction

This lecture covers Normalization Part 2 and SQL wrap-up.
Topics include Normal Forms (1NF, 2NF, 3NF, BCNF) and SQL wrap-up (Aggregate functions, GROUP BY, Multiple table operation, Data Definition Language, Data Manipulation Language, Transaction Control Language).

Database Design Process Overview

Data vs. Information
Conceptual Model: Entity-Relationship Modelling technique.
- Chen’s notation: good for high-level conceptual model initially.
- Crow’s Foot: the design standard in the textbook and this course.
- Entity type and entity instance.
- Attribute and value.
- Relationship (Degree, Connectivity, Cardinality).
Logical Model: Converting Conceptual model to detailed Logical Model (using Crow’s Foot) ready for DB implementation.
- Advanced topics:
  - Relationship strength
  - Composite entity
  - Relationship degree
  - Supertype and Subtype
  - Selecting Primary Key
Relational Model: Convert ER model to a set of tables (relations) in the relational model.
- Apply Normalisation on the relations to remove any anomalies
SQL Data Definition & Manipulation Language
- Use Relational Model to implement database by creating a table for each normalised relations
- Data Definition Language defines the tables
- Data Manipulation Language queries/updates the tables

Relevant Chapters

Chapter 6. Normalisation of Database Tables (6-7 to 6-8)
Chapter 7. Introduction to SQL (7-8 to 7-10)
Chapter 8. Advanced SQL (8-1 to 8-4)

Overview

Normal Forms
- Recap 1NF, 2NF, 3NF
- BCNF
SQL wrap-up
- Aggregate functions
- GROUP BY
- Multiple table operation
- Data Definition Language (DROP, ALTER)
- Data Manipulation Language (INSERT, DELETE, UPDATE)
- Transaction Control Language (COMMIT, ROLLBACK, SAVEPOINT)

Recap of Normalization

Normalization is a table design technique aimed at minimizing data redundancies.
Normalization focuses on the characteristics of specific entities.
The first three normal forms (1NF, 2NF, and 3NF) are most commonly used.
Best practice: continue the iterative ER process until all entities and their attributes are defined and all equivalent tables are in 3NF.
Normalization is an important part—but only a part—of the design process.
Proposed entities must meet the required normal form before table structures are created.
Principles and normalisation procedures to be understood to redesign and modify databases.
ERD is created through an iterative process.

Recap Example: Reports in a Construction Company

Building project has: Project number, Name, Employees assigned to the project.
Employee has: Employee number, Name, Job classification.
The company charges its clients by billing the hours spent on each project.
The hourly billing rate is dependent on the employee’s position.

Issues

Many multivalued attributes
Trying to fit multiple employees' data into one row for one project
Many data redundancies, e.g., employee data JOBCLASS, CHARGEHOUR is in many rows because employees can work on multiple projects
Data inconsistencies, e.g. “Database Designer” or “DB Designer”
All above could lead to insertion, deletion, update anomalies

1NF: Eliminate Repeating Groups

Eliminate the repeating groups and identify the Primary Key to fulfill 1NF

Table Name: RPT_FORMAT

PROJ_NUM	PROJECT_NAME	EMP_NUMBER	EMP_NAME	JOB_CLASS	CHARGE_HOUR	HOURS_BILLED
15	Evergreen	103,101,105, 106, 102	June E. Arbough, John G. News, Alice K. Johnson
*, William Smithfield, David H. Senior	Elec. Engineer, Database Designer, Database Designer, Programmer, Systems Analyst	85.5, 105., 105., 35.75, 98.75	48.1, 18.36
18	Amber Wave	114, 118, 104, 112	Annelise Jones, James J. Frommer, Anne K. Ramoras *, Darlene M. Smithson	Applications Designer, General Support, Systems Analyst, DSS Analyst	23.8, 19.4, 35.7, 12.6, 23.8	96.75, 45.95

Table Name: DATAORG1NF

PROJ_NUM	PROJ_NAME	EMP_NUM	EMP_NAME	JOB_CLASS	CHG_HOUR	HOURS
15	Evergreen	103	June E. Arbough	Elect. Engineer	84.50	23.8
15	Evergreen	101	John G. News	Database Designer	105.00	19.4
15	Evergreen	105	Alice K. Johnson	Database Designer	105.00	35.7
15	Evergreen	106	William Smithfield	Programmer	35.75	12.6
15	Evergreen	102	David H. Senior	Systems Analyst	96.75	23.8
18	Amber Wave	114	Annelise Jones	Applications Designer	48.10	24.6
18	Amber Wave	118	James J. Frommer	General Support	18.36	45.3
18	Amber Wave	104	Anne K. Ramoras	Systems Analyst	96.75	32.4
18	Amber Wave	112	Darlene M. Smithson	DSS Analyst	45.95	44.0

Primary Key: (PROJNUM, EMPNUM)

2NF and 3NF: Remove Dependencies

No repeating groups to meet 1NF requirement
Remove partial dependency for 2NF
Remove transitive dependency for 3NF
Dependency Diagram – how do we fulfill 2NF, 3NF?

2NF: Composite Entity for Many-to-Many Relationships

Create new table(s) to capture the relationship between the new tables after the split, in this case:
- A project can contain multiple employees. An employee can be working on multiple projects concurrently. This is a many-to-many relationship.
- Recall that in ERD, many-to-many relationships shall be decomposed to two 1:M relationships
- Hence a composite (bridge) entity must be used to capture the employees assigned to each project
Remove partial dependencies and check if a composite entity is needed, to fulfill 2NF

2NF: Remaining Attributes and Composite Keys

Find the right tables for remaining attributes outside the partial dependency, in this case:
- The new assignment entity is the composite entity bridging Project and Employee and can include HOURS as an additional attribute, specific to the relationship between 1 project and 1 employee.
- Attribute name changes from HOURS to ASSIGN_HOURS to make it more meaningful.
Having remaining attributes that are not in any partial dependency but are dependant on the original composite PK could suggest the need of a new entity with that composite PK
What if PROJECT and EMPLOYEE have a 1-to-many relationship? Do we absolutely need a composite entity ASSIGNMENT? Which table would ASSIGN_HOURS go into?
Find the right tables for remaining attributes.
Remaining attribute(s) going into a new table.

3NF: Remove Transitive Dependency

Remove transitive dependency by making new tables & reassigning attributes for 3NF
For every transitive dependency, put determinant in a new table as PK; place the dependent attributes in the same table as the determinant; remove the dependents from the original table.
In this case, we put JOBCLASS in a new table JOB, place CHGHOUR in the new table, and remove it from the original table
How can Charge Per Hour be linked to an employee?
CHGHOUR is dependent on JOBCLASS in JOB table, and JOB_CLASS is associated with each employee

Boyce-Codd Normal Form (BCNF)

Considered a special case of 3NF, to cover some specific aspects and problems with the 3NF, also known as 3.5NF
Not always achievable (in contrast, 3NF is always achievable)
Can be violated only when the table contains more than one candidate key (Equivalent to 3NF when the table contains only one candidate key)
Already 3NF AND every determinant in the Dependency Diagram should be a complete candidate key
Every determinant of every functional dependency must be a complete CK
When we have multiple candidate keys…
- Two candidate keys:
  - {CLASS_CODE}, chosen as PK
  - {CRSCODE, CLASSSECTION}
- 2NF and 3NF must be adhered to when we check the PK and all other candidate keys
- Is this 2NF? Yes — Can part of PK or CK determine other non-prime attributes? No
- Is this 3NF? Yes — Can a non-prime attribute determine another non-prime attribute? No

Case: 3NF but not BCNF

This is 3NF with no partial or transitive dependency
CLASSCODE -> STAFFID
- Not partial, because dependant STAFF_ID is a prime attribute (partial dependency only applies when the dependant is a non-prime attribute)
- Not transitive because CLASSCODE is a prime attribute, being part of a candidate key {STUID, CLASS_CODE} (transitive means non-prime attributes being determined by other non-prime attributes)
Not BCNF: in CLASSCODE -> STAFFID, the determinant CLASS_CODE is not a complete CK
Intuitively, the table is dual-purpose, staff assignments to classes AND enrolment, causing anomalies
Update Anomaly: if a different staff is assigned to CLASS_CODE 123, all the rows in the table for that class must be updated correctly
Deletion Anomaly: if all 5 existing students dropped out of CLASS_CODE 123 in week 3, after these 5 rows are deleted, information about who taught that class is lost forever

BCNF Violation with Multiple Candidate Keys

BCNF might be violated when multiple CK exists
For a table to be in BCNF, every determinant in all dependencies must be a complete CK
Need to remove the dependency CLASSCODE -> STAFFID, which is not partial or transitive, but it is problematic
Solution: decompose the table

Decompose Table to Convert 3NF to BCNF

Now every determinant in a dependency in each table is a candidate key
In this case, both tables have only 1 candidate key, so 3NF and BCNF clearly both hold
- {STUID, CLASSCODE} is the only candidate key in the left table
- {CLASS_CODE} is the only candidate key in the right table

Denormalization

Denormalization is needed to avoid excessive computation (e.g., JOIN, CASE, maths)

Reason	Example	Benefits of denormalisation	Control needed to avoid anomalies
Redundant Data	Storing ZIP and CITY in AGENT table when ZIP determines CITY	Reduce frequent JOIN to ZIP_CITY table; Improve performance lag in queries; Dropdown on user interface, where city is auto-populated based on zip code	Periodic data consistency checks
Derived Data	Storing STUCLASS (Freshman, Junior…) in STUDENT table derived from STUHRS (student total hours in class to date)	Reduce runtime calculations (CASE WHEN STU_HRS < 30 THEN 'Freshman’ …); Avoid complex query doing calculation on the fly	Automatic recalculation is only triggered when STUHRS change; STUCLASS can be made read-only field to prevent any manual edits
Pre-aggregated Data	Storing STU_WAM permanently in STUDENT table, redundant data based on each course score	Avoid expensive WAM maths recalculations upon every SELECT query; WAM auto-updated only when grades change	Admin-only access to update aggregated WAM
Temporary Reporting	Tables Creating pivoted table with repeating groups for reporting purpose (more details next)	Reduce complex SQL JOIN and pivoting operations	Temp tables are automatically deleted after report issuance; Temp tables’ write access must be restricted to only reporting team

Denormalized Temporary Table for Reporting

The last four semesters taught are not necessarily the same for all staff, e.g., sabbaticals, new joiner
Normalised EVALDATA table is the master table
FACHIST table (faculty history) is a temporary denormalised table created from EVALDATA table
FACHIST table is used for reporting and filtering, e.g., filter by department, by feedback score (e.g., >3.5)

SQL Wrap-Up: SELECT Query Structure

SELECT [DISTINCT | ALL] {* | [column_expression [AS new_name]] [, …] }
FROM table_name [alias] [, …]
[WHERE condition]
[GROUP BY column_list]
[HAVING condition]
[ORDER BY column_list];

[ ]: indicates optional elements.
{ }: indicates that the element may or may not appear.
|: indicates “or”.
;: indicates the end of the statement.

Oracle SQL Aggregate Functions

Aggregate functions can summarize statistics of selected rows
- COUNT: number of selected rows.
- MAX: maximum value within selected rows
- MIN: minimum value within selected rows
- AVG: arithmetic mean (average) for the selected rows.
- SUM: total value for a numeric attribute within selected rows.

-- Example 1
-- How many vendors referenced in the PRODUCT table have supplied products with prices that are less than or equal A$10.00?
SELECT COUNT(DISTINCT V_Code)
FROM PRODUCT
WHERE P_Price <= 10.00;

-- Example 2
-- What is the max price for all products?
SELECT MAX(P_Price)
FROM PRODUCT;

All these functions yield only one single value

Oracle SQL Aggregate Functions: SUM, AVG

-- Example 3
-- What is the total dollar value of all the items on hand in our inventory?
SELECT SUM(P_Onhand * P_Price)
FROM PRODUCT;

-- Example 4
-- What is the average of product price in apple category?
SELECT AVG(P_Price)
FROM PRODUCT
WHERE lower(P_Description) LIKE ‘%apple%’;

SQL Wrap-Up: Nested Queries

SELECT MAX(P_Price)
FROM PRODUCT;

-- This previous query shows the highest price from the product table…
-- …but does not show the product details
SELECT P_Code, P_Description, P_Price
FROM PRODUCT
WHERE P_Price = (SELECT MAX(P_Price) FROM PRODUCT);

We need a nested query (query in a query)
The nested query (inner query, subquery) is performed first, and the single-value result is used for the outer query

SQL Wrap-Up: GROUP BY and HAVING

SELECT …
FROM …
GROUP BY <column>
HAVING <condition>

Groups the rows from SELECT operation and produces a single summary row for each group
The GROUP BY clause is valid only when SELECT contains aggregate functions, such as AVG, SUM, COUNT

-- How many product units per product description are available from the vendor with vendor code 21344? (min 200 units)
SELECT V_Code, P_Description, SUM(P_Onhand)
FROM PRODUCT
WHERE V_Code = 21344
GROUP BY V_Code, P_Description
HAVING SUM(P_Onhand) > 200;

SQL Wrap-Up: Multiple Table Operations

Multiple table operations are similar to JOIN, but no explicit “JOIN … ON”

-- List all student details in course ‘COMM2822’:
SELECT StdNo, StdName, Major, CourseNo, Grade
FROM Student, Enrolment
WHERE Student.StdNo = Enrolment.StdNo
  AND Enrolment.CourseNo = 'COMM2822';

-- List the student and course details for student number 38214 (Note that we use aliases instead of the full table names)
SELECT StdName, CourseTitle, Grade
FROM Student S, Enrolment E, Course C
WHERE S.StdNo = E.StdNo
  AND C.CourseNo = E.CourseNo
  AND E.StdNo = 38214

SQL Wrap-Up: Overview of SQL Commands Grouped in Categories

Category	Command	Explanation
DDL (Data Definition Language)	CREATE	Creates new tables, indexes, or database structures
	ALTER	Modifies existing database structures, e.g., adding or removing columns
	DROP	Removes tables, indexes, or other database objects permanently
DML (Data Manipulation	INSERT	Adds new rows of data to a table
	DELETE	Removes existing rows from a table
	UPDATE	Modifies existing data within a table
	SELECT	Retrieves data from one or more tables, allowing filtering, aggregation
TCL (Transaction Control	COMMIT	Saves all changes made during the current transaction to the database
	ROLLBACK	Reverts changes made in the current transaction if an error occurs
	SAVEPOINT	Creates a temporary save point within a transaction, allowing partial rollbacks

INSERT Statement

INSERT can add raw data into a table or add data from another table

-- INSERT raw data
INSERT INTO table_name [(column_name, …)]
VALUES (data_value, …);

-- INSERT data from an existing table
INSERT INTO table_name (column_name, …)
SELECT column_name, …
FROM copy_from_table_name
[WHERE (condition)]

-- Add a new customer
INSERT INTO Customers (CustomerName, ContactName, Address, City, PostalCode, Country)
VALUES ('Kai Trading', 'John Wick', 'Houston Rd 1', 'Kensington', '2033', 'Australia’);

-- we launched a truck fleet service and can serve all our suppliers to ship their products; hence all suppliers become customers
INSERT INTO Customers (CustomerName, City, Country)
SELECT SupplierName, City, Country
FROM Suppliers;

UPDATE Statement

UPDATE can update selected rows and columns

UPDATE table_name
SET column_name = new_data_value, …
[WHERE (condition)]

-- Kai Trading moved to a new address
UPDATE Customers
SET City       = 'Bondi Junction',
    PostalCode = '2022',
    Address    = 'Council St 23'
WHERE CustomerName = 'Kai Trading';

DELETE Statement

DELETE FROM table_name
[WHERE (condition)]

WHERE clause specifies which rows should be deleted
If you do not have a filter with WHERE, all records in the table will be deleted!
Think a million times if you do NOT need the WHERE filter

-- Mohit Metals (CustomerID 333) stops the contract with our company
DELETE
FROM Customers
WHERE CustomerID = 333;

-- This should just delete one row with PK being 333

ALTER Statement

ALTER can change table structure by adding or dropping columns

ALTER TABLE table_name
ADD column_name datatype;

ALTER TABLE table_name
MODIFY column_name new_datatype;

ALTER TABLE table_name
DROP COLUMN column_name;

-- Add a new column to table customers
ALTER TABLE Customers
ADD Phone VARCHAR2(15);

-- Change the new column Phone to NUMBER data type (only works if column is empty or all values in the column are indeed convertible)
ALTER TABLE Customers
MODIFY Phone NUMBER (15);

-- Drop the column Phone
ALTER TABLE Customers
DROP COLUMN Phone;

DROP Statement

DROP can drop the entire table, if no violation of referential integrity
Foreign Key
Is this 1-to-1 or 1-to-many relationship? 1-to-many
Which entity is the child, i.e. the “many” side? Customers
What if we try to drop Sales_agents table before dropping Customers table?
AgentID 101 and 102 in Customers table would be invalid – referential integrity violation

SAVEPOINT, ROLLBACK, and COMMIT

SAVEPOINT name; creates a named checkpoint within a transaction, allowing partial rollbacks
ROLLBACK TO name; reverts changes to a specific SAVEPOINT
ROLLBACK; undo ALL changes since the last COMMIT OR the first DML statement of the current set of uncommitted changes (in Oracle only)
COMMIT; marks end of transaction and save changes

COMMIT;

INSERT INTO employees (id, name)
VALUES (1, 'Mickey’);

SAVEPOINT after_insert;

UPDATE employees
SET name = 'Donald'
WHERE id = 1;

SAVEPOINT after_update;

DELETE
FROM employees
WHERE id = 1;

-- Partial rollback: undo DELETE
ROLLBACK TO after_update;

UPDATE employees
SET name = 'Daisy'
WHERE id = 1;

ROLLBACK;

-- Full rollback: undo all changes since the last COMMIT and we revert to an empty table
COMMIT;

Recap and Next Lecture

BCNF
SQL wrap-up
- Aggregate functions
- GROUP BY
- Multiple table operation
- Data Definition Language (DROP, ALTER)
- Data Manipulation Language (INSERT, DELETE, UPDATE)
- Transaction Control Language (COMMIT, ROLLBACK, SAVEPOINT)
Next lecture Recap
- Relational Algebra
- Database Development