Chapter 3 Data Management
Retail Sales and Catalog Content
Data Extraction
The Cape Codd marketing department requires an analysis of:
In-store sales – Data on sales that occur in physical store locations.
Catalog content – Information about products available in catalogs.
Instead of accessing the entire database, only a relevant subset of retail sales data and catalog content is needed.
The IS department (Information Systems department) extracts the required data from the operational database and stores it in a separate, offline database for the marketing team to analyze.
Operational Database – A live database that supports real-time transactions.
Offline Database – A database used for reporting and analysis, separate from real-time operations.
SQL as a Data Sublanguage
SQL (Structured Query Language) is not a full-fledged programming language; instead, it is a data sublanguage.
SQL is specifically designed for creating and processing database data and metadata.
SQL is ubiquitous in modern enterprise-class DBMS (Database Management Systems).
SQL Categories
SQL statements can be grouped into five categories based on their function:
Data Definition Language (DDL)
Used for creating database structures, including tables and relationships.
Example:
CREATE TABLE
,ALTER TABLE
,DROP TABLE
Data Manipulation Language (DML)
Used for inserting, querying, modifying, and deleting data.
Example:
SELECT
,INSERT
,UPDATE
,DELETE
SQL/Persistent Stored Modules (SQL/PSM)
Extends SQL by adding procedural programming features like loops and conditional logic.
Example:
BEGIN...END
,DECLARE
,IF...THEN
Transaction Control Language (TCL)
Used for managing transactions (groups of SQL operations that must be executed together).
Example:
COMMIT
,ROLLBACK
,SAVEPOINT
Data Control Language (DCL)
Used to grant or revoke access permissions for users.
Example:
GRANT
,REVOKE
The SQL SELECT/FROM/WHERE Framework
The core SQL query structure is the SELECT statement, which retrieves data from a database.
SELECT {ColumnName(s)} → Specifies which columns to retrieve.
FROM {TableName(s)} → Specifies which table(s) the data comes from.
WHERE {Condition(s)} → Filters the data based on specified conditions.
Retrieving Data Using SQL
Selecting Specific Columns from a Single Table
Example:
SELECT SKU, SKU_Description, Department, Buyer
FROM SKU_DATA;
This query retrieves only the
SKU
,SKU_Description
,Department
, andBuyer
columns from theSKU_DATA
table.
Selecting All Columns Using the SQL Asterisk (*) Wildcard
Example:
SELECT * FROM SKU_DATA;
The
*
wildcard selects all columns from theSKU_DATA
table.
Eliminating Duplicates
Use the DISTINCT keyword to remove duplicate rows from the result set.
Example:
SELECT DISTINCT Department FROM SKU_DATA;
Limiting the Number of Rows Displayed
Use the TOP {NumberOfRows} function to control the number of rows returned.
Example:
SELECT TOP 10 * FROM SKU_DATA;
This query retrieves only the first 10 rows.
SQL SELECT Statement Clauses
A SQL query can contain multiple clauses that define the query’s behavior:
SELECT – Specifies which columns to retrieve.
FROM – Specifies the table(s) where data is located.
WHERE – Filters records based on a condition.
GROUP BY – Groups data based on a column(s).
HAVING – Filters grouped results (used with aggregate functions like SUM, COUNT).
ORDER BY – Sorts the final result set.
SQL WHERE Clauses That Use Character String Patterns
SQL allows pattern matching using the LIKE and NOT LIKE operators.
Wildcard symbols can be used to refine searches:
Underscore (_) wildcard → Represents a single, unspecified character in a specific position.
Percent sign (%) wildcard → Represents any sequence of characters, including spaces.
Example: Using Wildcards in a WHERE Clause
SELECT * FROM Customers WHERE LastName LIKE 'S%';
This retrieves all customers whose LastName starts with "S".
A Limitation to SQL Built-in Functions
You Cannot Mix Table Column Names with SQL Built-in Functions
Example (Incorrect usage):
SELECT CustomerName, COUNT(OrderID) FROM Orders;
This would cause an error because
CustomerName
is a column, whileCOUNT(OrderID)
is an aggregate function.To fix this, use GROUP BY:
SELECT CustomerName, COUNT(OrderID) FROM Orders GROUP BY CustomerName;
You Cannot Use Aggregate Functions in a WHERE Clause
Example (Incorrect usage):
SELECT * FROM Orders WHERE SUM(Amount) > 1000;
The WHERE clause operates on rows, while aggregate functions operate on columns.
To filter based on aggregate functions, use the HAVING clause instead:
SELECT CustomerID, SUM(Amount) FROM Orders GROUP BY CustomerID HAVING SUM(Amount) > 1000;
The Difference Between WHERE and HAVING
WHERE → Filters individual rows before they are grouped.
HAVING → Filters groups after aggregation.
SQL always processes WHERE before HAVING to avoid ambiguity.
Example: Using WHERE and HAVING Together
SELECT CustomerID, COUNT(OrderID)
FROM Orders
WHERE OrderDate > '2024-01-01'
GROUP BY CustomerID
HAVING COUNT(OrderID) > 5;
The WHERE clause filters orders placed after January 1, 2024.
The HAVING clause filters customers who have more than 5 orders.