Module 7 - JOINS and Normalization
MIST 7510E - Database Management
Course: Database Management
Topic: JOINS and Normalization
Instructor: Nikhil Srinivasan
Institution: The University of Georgia, TERRY COLLEGE OF BUSINESS
SQL JOIN
Definition:
A SQL join clause combines rows from two or more database tables into a single table.
It locates related column values in the two tables.
Characteristics:
A query can contain zero, one, or multiple JOIN operations.
Different Types of JOINs
(INNER) JOIN:
Selects records that have matching values in both tables.
LEFT (OUTER) JOIN:
Selects records from the first (left) table with matching records from the right table.
RIGHT (OUTER) JOIN:
Selects records from the second (right) table with matching records from the left table.
FULL (OUTER) JOIN:
Selects all records that match either left or right table records.
Illustrating Joins
Example tables: left_table and right_table to understand JOINs better.
INNER JOIN
Definition:
An Inner Join selects all rows from both tables as long as there is a match between the columns.
Looks for matches in the right table corresponding to all entries in the key field of the left table.
Syntax:
SELECT id, l.val AS l_val, r.val AS r_val FROM left_table l INNER JOIN right_table r ON l.id= r.id;
Example from Database: INNER JOIN
Aim: Retrieve Order number corresponding to each customer.
Query:
SELECT C.customerNumber, C.customerName, O.orderNumber FROM Customers C INNER JOIN Orders O ON C.customerNumber=O.customerNumber;
LEFT JOIN
Definition:
Returns all records from the left table and the matched records from the right table.
If no match exists, the result is NULL from the right side.
Syntax:
SELECT id, l.val AS l_val, r.val AS r_val FROM left_table l LEFT JOIN right_table r ON l.id= r.id;
Example from Database (LEFT JOIN)
Aim: Retrieve Order numbers corresponding to each customer.
Query:
SELECT C.customerNumber, C.customerName, O.orderNumber FROM Customers C LEFT JOIN Orders O ON C.customerNumber=O.customerNumber;
RIGHT JOIN
Definition:
Returns all records from the right table and the matched records from the left table.
If no match exists, the result is NULL from the left side.
Syntax:
SELECT id, l.val AS l_val, r.val AS r_val FROM left_table l RIGHT JOIN right_table r ON l.id= r.id;
Example from Database (RIGHT JOIN)
Aim: Retrieve Total quantity ordered for each product.
Query:
SELECT SUM(O.quantityOrdered) Total_Quantity_Ordered, P.productCode FROM OrderDetails O RIGHT JOIN Products P ON P.productCode=O.productCode GROUP BY P.productCode;
OUTER JOIN
Definition:
Returns all rows from both left and right tables.
Unmatched rows from either table will return as NULL.
Syntax:
SELECT id, l.val AS l_val, r.val AS r_val FROM left_table l OUTER JOIN right_table r ON l.id= r.id;
Example from Database (OUTER JOIN)
Aim: Retrieve Order numbers corresponding to each customer.
Note: MySQL does not support Full Join directly, but it can be simulated using a UNION of Left and Right joins.
Query:
SELECT C.customerNumber, C.customerName, O.orderNumber FROM Customers C LEFT JOIN Orders O ON C.customerNumber=O.customerNumber UNION SELECT C.customerNumber, C.customerName, O.orderNumber FROM Customers C RIGHT JOIN Orders O ON C.customerNumber=O.customerNumber WHERE C.customerNumber IS NULL;
Normalization
Definition:
An alternative database design tool to data modeling.
The theoretical foundation for the relational model.
Application of a series of rules that gradually improve the design.
Functional Dependency
Definition:
A relationship between attributes in an entity where one or more attributes determine the value of another attribute.
Example:
stock code → firm name, stock price, stock quantity, stock dividend
Knowing the stock code provides knowledge of the firm name, etc.
Multivalued dependency Formula:
(stock dividend, stock price) → yield
Full Functional Dependency
Example:
Yield is fully functionally dependent on stock dividend and stock price as both are needed to determine the yield.
Determinants:
An attribute that fully functionally determines another attribute.
Example:
stock code determines stock PE.
Multidetermination
Definition:
A given value can determine multiple values, denoted as A → → B.
Example:
Department multidetermines course.
Attribute Relationships
One-to-One:
A value of an attribute determines the value of another attribute and vice-versa.
Example: CH → Switzerland and Switzerland → CH.
One-to-Many:
A value of one attribute determines the value of another attribute but not vice versa.
Example: country name → currency unit but currency unit does not determine country name.
Many-to-Many:
Neither attribute determines the other.
Example: country name not → language and language not → country name.
Normal Forms
Definition:
A classification of relations, often depicted like Russian dolls (nested forms).
First Normal Form (1NF)
All rows must have the same number of columns.
Single-valued attributes only.
Second Normal Form (2NF)
Violated when a nonkey column is a fact about part of the primary key.
Example: customer-credit in an ordering context.
Third Normal Form (3NF)
Violated when a nonkey column is a fact about another nonkey column.
Example: exchange rate dependent on stock code.
Other Normal Forms
Bryce-Codd Normal Form (BCNF)
Fourth Normal Form
Fifth Normal Form
Domain-Key Normal Form (DKNF)
Data Modelling and Normalization
Data modeling is often an easier path to good database design.
A high-fidelity data model will be of high normal form.
5NF can create significant complexities; check for special rules.
Data Modeling Methods
Chen's entity-relationship (E-R) approach is widely recognized.
No standard for E-R method; models share common concepts making learning transferable.
Representing Relationships
Various dialects exist for representing relationships in data modeling.
Goal
Objective: Learn to think like a data modeler; precision in representation comes after mastering basics.
Key Points
Normalization is one approach to data modeling.
Multiple representations for data models exist.
Learning to model is difficult but representing a model is easier.