0.0(0)

Chapter 3 Data Management


Retail Sales and Catalog Content

Data Extraction
  • The Cape Codd marketing department requires an analysis of:

    • In-store sales – Data on sales that occur in physical store locations.

    • Catalog content – Information about products available in catalogs.

  • Instead of accessing the entire database, only a relevant subset of retail sales data and catalog content is needed.

  • The IS department (Information Systems department) extracts the required data from the operational database and stores it in a separate, offline database for the marketing team to analyze.

    • Operational Database – A live database that supports real-time transactions.

    • Offline Database – A database used for reporting and analysis, separate from real-time operations.


SQL as a Data Sublanguage

  • SQL (Structured Query Language) is not a full-fledged programming language; instead, it is a data sublanguage.

  • SQL is specifically designed for creating and processing database data and metadata.

  • SQL is ubiquitous in modern enterprise-class DBMS (Database Management Systems).


SQL Categories

SQL statements can be grouped into five categories based on their function:

  1. Data Definition Language (DDL)

    • Used for creating database structures, including tables and relationships.

    • Example: CREATE TABLE, ALTER TABLE, DROP TABLE

  2. Data Manipulation Language (DML)

    • Used for inserting, querying, modifying, and deleting data.

    • Example: SELECT, INSERT, UPDATE, DELETE

  3. SQL/Persistent Stored Modules (SQL/PSM)

    • Extends SQL by adding procedural programming features like loops and conditional logic.

    • Example: BEGIN...END, DECLARE, IF...THEN

  4. Transaction Control Language (TCL)

    • Used for managing transactions (groups of SQL operations that must be executed together).

    • Example: COMMIT, ROLLBACK, SAVEPOINT

  5. Data Control Language (DCL)

    • Used to grant or revoke access permissions for users.

    • Example: GRANT, REVOKE


The SQL SELECT/FROM/WHERE Framework

  • The core SQL query structure is the SELECT statement, which retrieves data from a database.

    • SELECT {ColumnName(s)} → Specifies which columns to retrieve.

    • FROM {TableName(s)} → Specifies which table(s) the data comes from.

    • WHERE {Condition(s)} → Filters the data based on specified conditions.


Retrieving Data Using SQL

Selecting Specific Columns from a Single Table

Example:

SELECT SKU, SKU_Description, Department, Buyer
FROM SKU_DATA;
  • This query retrieves only the SKU, SKU_Description, Department, and Buyer columns from the SKU_DATA table.

Selecting All Columns Using the SQL Asterisk (*) Wildcard

Example:

SELECT * FROM SKU_DATA;
  • The * wildcard selects all columns from the SKU_DATA table.

Eliminating Duplicates
  • Use the DISTINCT keyword to remove duplicate rows from the result set.
    Example:

SELECT DISTINCT Department FROM SKU_DATA;
Limiting the Number of Rows Displayed
  • Use the TOP {NumberOfRows} function to control the number of rows returned.
    Example:

SELECT TOP 10 * FROM SKU_DATA;
  • This query retrieves only the first 10 rows.


SQL SELECT Statement Clauses

A SQL query can contain multiple clauses that define the query’s behavior:

  1. SELECT – Specifies which columns to retrieve.

  2. FROM – Specifies the table(s) where data is located.

  3. WHERE – Filters records based on a condition.

  4. GROUP BY – Groups data based on a column(s).

  5. HAVING – Filters grouped results (used with aggregate functions like SUM, COUNT).

  6. ORDER BY – Sorts the final result set.


SQL WHERE Clauses That Use Character String Patterns

  • SQL allows pattern matching using the LIKE and NOT LIKE operators.

  • Wildcard symbols can be used to refine searches:

    • Underscore (_) wildcard → Represents a single, unspecified character in a specific position.

    • Percent sign (%) wildcard → Represents any sequence of characters, including spaces.

Example: Using Wildcards in a WHERE Clause
SELECT * FROM Customers WHERE LastName LIKE 'S%';
  • This retrieves all customers whose LastName starts with "S".


A Limitation to SQL Built-in Functions

  1. You Cannot Mix Table Column Names with SQL Built-in Functions

    • Example (Incorrect usage):

    SELECT CustomerName, COUNT(OrderID) FROM Orders;
    • This would cause an error because CustomerName is a column, while COUNT(OrderID) is an aggregate function.

    • To fix this, use GROUP BY:

      SELECT CustomerName, COUNT(OrderID) 
      FROM Orders
      GROUP BY CustomerName;
  2. You Cannot Use Aggregate Functions in a WHERE Clause

    • Example (Incorrect usage):

    SELECT * FROM Orders WHERE SUM(Amount) > 1000;
    • The WHERE clause operates on rows, while aggregate functions operate on columns.

    • To filter based on aggregate functions, use the HAVING clause instead:

      SELECT CustomerID, SUM(Amount) 
      FROM Orders
      GROUP BY CustomerID
      HAVING SUM(Amount) > 1000;

The Difference Between WHERE and HAVING

  • WHERE → Filters individual rows before they are grouped.

  • HAVING → Filters groups after aggregation.

  • SQL always processes WHERE before HAVING to avoid ambiguity.

Example: Using WHERE and HAVING Together
SELECT CustomerID, COUNT(OrderID) 
FROM Orders
WHERE OrderDate > '2024-01-01'
GROUP BY CustomerID
HAVING COUNT(OrderID) > 5;
  • The WHERE clause filters orders placed after January 1, 2024.

  • The HAVING clause filters customers who have more than 5 orders.


0.0(0)
robot