The Cape Codd marketing department requires an analysis of:
In-store sales – Data on sales that occur in physical store locations.
Catalog content – Information about products available in catalogs.
Instead of accessing the entire database, only a relevant subset of retail sales data and catalog content is needed.
The IS department (Information Systems department) extracts the required data from the operational database and stores it in a separate, offline database for the marketing team to analyze.
Operational Database – A live database that supports real-time transactions.
Offline Database – A database used for reporting and analysis, separate from real-time operations.
SQL (Structured Query Language) is not a full-fledged programming language; instead, it is a data sublanguage.
SQL is specifically designed for creating and processing database data and metadata.
SQL is ubiquitous in modern enterprise-class DBMS (Database Management Systems).
SQL statements can be grouped into five categories based on their function:
Data Definition Language (DDL)
Used for creating database structures, including tables and relationships.
Example: CREATE TABLE
, ALTER TABLE
, DROP TABLE
Data Manipulation Language (DML)
Used for inserting, querying, modifying, and deleting data.
Example: SELECT
, INSERT
, UPDATE
, DELETE
SQL/Persistent Stored Modules (SQL/PSM)
Extends SQL by adding procedural programming features like loops and conditional logic.
Example: BEGIN...END
, DECLARE
, IF...THEN
Transaction Control Language (TCL)
Used for managing transactions (groups of SQL operations that must be executed together).
Example: COMMIT
, ROLLBACK
, SAVEPOINT
Data Control Language (DCL)
Used to grant or revoke access permissions for users.
Example: GRANT
, REVOKE
The core SQL query structure is the SELECT statement, which retrieves data from a database.
SELECT {ColumnName(s)} → Specifies which columns to retrieve.
FROM {TableName(s)} → Specifies which table(s) the data comes from.
WHERE {Condition(s)} → Filters the data based on specified conditions.
Example:
SELECT SKU, SKU_Description, Department, Buyer
FROM SKU_DATA;
This query retrieves only the SKU
, SKU_Description
, Department
, and Buyer
columns from the SKU_DATA
table.
Example:
SELECT * FROM SKU_DATA;
The *
wildcard selects all columns from the SKU_DATA
table.
Use the DISTINCT keyword to remove duplicate rows from the result set.
Example:
SELECT DISTINCT Department FROM SKU_DATA;
Use the TOP {NumberOfRows} function to control the number of rows returned.
Example:
SELECT TOP 10 * FROM SKU_DATA;
This query retrieves only the first 10 rows.
A SQL query can contain multiple clauses that define the query’s behavior:
SELECT – Specifies which columns to retrieve.
FROM – Specifies the table(s) where data is located.
WHERE – Filters records based on a condition.
GROUP BY – Groups data based on a column(s).
HAVING – Filters grouped results (used with aggregate functions like SUM, COUNT).
ORDER BY – Sorts the final result set.
SQL allows pattern matching using the LIKE and NOT LIKE operators.
Wildcard symbols can be used to refine searches:
Underscore (_) wildcard → Represents a single, unspecified character in a specific position.
Percent sign (%) wildcard → Represents any sequence of characters, including spaces.
SELECT * FROM Customers WHERE LastName LIKE 'S%';
This retrieves all customers whose LastName starts with "S".
You Cannot Mix Table Column Names with SQL Built-in Functions
Example (Incorrect usage):
SELECT CustomerName, COUNT(OrderID) FROM Orders;
This would cause an error because CustomerName
is a column, while COUNT(OrderID)
is an aggregate function.
To fix this, use GROUP BY:
SELECT CustomerName, COUNT(OrderID)
FROM Orders
GROUP BY CustomerName;
You Cannot Use Aggregate Functions in a WHERE Clause
Example (Incorrect usage):
SELECT * FROM Orders WHERE SUM(Amount) > 1000;
The WHERE clause operates on rows, while aggregate functions operate on columns.
To filter based on aggregate functions, use the HAVING clause instead:
SELECT CustomerID, SUM(Amount)
FROM Orders
GROUP BY CustomerID
HAVING SUM(Amount) > 1000;
WHERE → Filters individual rows before they are grouped.
HAVING → Filters groups after aggregation.
SQL always processes WHERE before HAVING to avoid ambiguity.
SELECT CustomerID, COUNT(OrderID)
FROM Orders
WHERE OrderDate > '2024-01-01'
GROUP BY CustomerID
HAVING COUNT(OrderID) > 5;
The WHERE clause filters orders placed after January 1, 2024.
The HAVING clause filters customers who have more than 5 orders.