AM

Advanced SQL Topics and Query Design

Multiple-Table Queries & Relational Power

  • Relational DBMSs show their real strength when queries touch several tables connected by primary-foreign key pairs.
    • Ex: In Pine Valley Furniture (PVF) database, Order_T.CustomerID (FK) references Customer_T.CustomerID (PK) ⇒ lets us list all orders for each customer.
  • SQL makes this possible through:
    • Joins (INNER, OUTER, FULL, LEFT, RIGHT, CROSS, UNION, NATURAL).
    • Subqueries (nested, correlated, in WHERE/HAVING, or even in FROM as derived tables).
    • Set-operations (UNION, INTERSECT, MINUS) that merge whole result sets.
  • SELECT’s WHERE or FROM … JOIN clauses host the join predicates.
    • One ON / WHERE condition per table-pair; joining A,B,C requires two conditions A!\text{–}B and B!\text{–}C.

Join Types & Canonical Syntax

  • INNER JOIN ⇒ returns rows that satisfy the join predicate in all tables.
    • ANSI form: … FROM A INNER JOIN B ON A.pk = B.fk.
    • Older style: list tables in FROM, place predicates in WHERE.
    • USING (if column names identical) shortens syntax: … JOIN B USING (CustomerID).
  • OUTER JOINs guarantee preservation of non-matching rows:
    • LEFT: all rows of 1st table + matches in 2nd; unmatched 2nd-side columns become NULL.
    • RIGHT: opposite of LEFT.
    • FULL: union of LEFT + RIGHT; unmatched columns on either side become NULL.
  • CROSS JOIN (Cartesian) ⇒ explicit |A| \times |B| combinations; implicit when WHERE clause is missing.
    • Ex: 10 orders × 15 customers ⇒ 150 rows, rarely useful.
  • UNION JOIN ⇒ concatenates vertically the attributes of two tables (less widely supported).
  • NATURAL keyword (optional) triggers an equi-join on all identically named columns and removes duplicates from output.
    • DBMS chooses join order; optimizer weighs indexes, 1:M distribution, etc.

Equi-Join & Natural Join Deep-Dive

  • Equi-join predicate is equality: A.col = B.col.
    • Example (PVF):
      sql SELECT Customer_T.CustomerID, Order_T.CustomerID, CustomerName, OrderID FROM Customer_T, Order_T WHERE Customer_T.CustomerID = Order_T.CustomerID ORDER BY OrderID;
    • Includes both matching columns; useful for troubleshooting.
  • Without WHERE ⇒ Cartesian product.
  • Natural join version (duplicates removed):
  SELECT Customer_T.CustomerID,
         CustomerName,
         OrderID
  FROM   Customer_T NATURAL JOIN Order_T
  ON     Customer_T.CustomerID = Order_T.CustomerID;
  • Always qualify ambiguous names (e.g., CustomerID) when both tables own the column.

Outer Joins & NULL Handling

  • Illustrate missing matches: some customers placed no orders; some orders may reference defunct customers.
  • LEFT OUTER example:
  SELECT C.CustomerID, CustomerName, OrderID
  FROM   Customer_T C LEFT OUTER JOIN Order_T O
         ON C.CustomerID = O.CustomerID;
  • Customers without orders show OrderID = NULL.
    • RIGHT OUTER mirrors above; FULL OUTER merges both row sets.
    • DBMS differences: vendor proprietary syntax historically (Oracle’s (+), SQL Server’s *=) → ANSI syntax now preferred.
    • Joining >2 tables with OUTER JOINs: test carefully—results may differ among vendors due to join-reordering rules.

Complex Joins: Four-Table Invoice & Self-Join

  • Business invoice (Order 1006) needs data from CustomerT, OrderT, OrderLineT, ProductT.
  SELECT C.CustomerID, CustomerName, CustomerAddress, CustomerCity,
         CustomerState, CustomerPostalCode,
         O.OrderID, OrderDate,
         OrderedQuantity,
         ProductDescription,
         StandardPrice,
         (OrderedQuantity * ProductStandardPrice) AS LineTotal
  FROM   Customer_T  C,
         Order_T     O,
         OrderLine_T L,
         Product_T   P
  WHERE  O.CustomerID = C.CustomerID
    AND  O.OrderID    = L.OrderID
    AND  L.ProductID  = P.ProductID
    AND  O.OrderID    = 1006;
  • Self-join solves unary relationships.
    • PVF Employee_T where EmployeeSupervisor FK points back to EmployeeID.
  SELECT E.EmployeeID,
         E.EmployeeName,
         M.EmployeeName AS Manager
  FROM   Employee_T E, Employee_T M
  WHERE  E.EmployeeSupervisor = M.EmployeeID;
  • Also answers “employees earning more than their manager”, “married to manager”, etc.
  • Ensure columns share domain semantics.

Subqueries & Derived Tables

  • Subquery lives in WHERE/HAVING and feeds outer SELECT.
    • Non-correlated computed once.
    • Correlated recomputed per outer row (uses outer columns).
  • Basic join vs. subquery illustration (order 1008):
    • Join:
      sql … FROM Customer_T, Order_T WHERE Customer_T.CustomerID = Order_T.CustomerID AND OrderID = 1008;
    • Equivalent subquery:
      sql … FROM Customer_T WHERE CustomerID = (SELECT CustomerID FROM Order_T WHERE OrderID = 1008);
  • Operators for list-returning subqueries: IN, NOT IN, EXISTS, NOT EXISTS.
    • Ex: customers who bought at least one product ⇒
      sql SELECT CustomerName FROM Customer_T WHERE CustomerID IN (SELECT DISTINCT CustomerID FROM Order_T);
  • Derived Table in FROM creates a temporary result you can join/filter again.
    • Average price example:
      sql SELECT ProductDescription, ProductStandardPrice, AvgPrice FROM (SELECT AVG(ProductStandardPrice) AS AvgPrice FROM Product_T) AS A, Product_T WHERE ProductStandardPrice > AvgPrice;
    • Allows both detail and aggregate values in same row.

Combining Result Sets

  • UNION vertically appends compatible SELECT outputs (same column count & data types).
    • CAST helps align types: CAST(OrderDate AS CHAR).
    • Sample: customers with largest vs. smallest OrderedQuantity using two queries + UNION.
  • INTERSECT ⇒ common rows; MINUS (or EXCEPT) ⇒ rows in 1st not in 2nd.
  • Final ordering is done after UNION with ORDER BY across whole union.

Conditional Expressions (CASE, NULLIF, COALESCE)

  • CASE provides IF-THEN-ELSE inside SQL.
    • Simple example hides descriptions when ProductLine ≠ 1:
      sql SELECT CASE WHEN ProductLine = 1 THEN ProductDescription ELSE '####' END AS ProductDescription FROM Product_T;
  • Predicate-based CASE supports three-valued logic.
  • NULLIF(a,b) ⇒ returns NULL if a = b else a (good for divide-by-zero avoidance).
  • COALESCE(a,b,c,…) ⇒ first non-NULL argument.

Crafting & Debugging Complex Queries

  • Break huge statements into views or CTEs; chain them.
    • Example view TSales totals quantity by salesperson/product; later correlated subquery picks best-seller for each salesperson.
  • TOP / LIMIT / FETCH FIRST n PERCENT syntax returns first n rows after ORDER BY (SQL:2003).
  • Typical error checks: Cartesian joins, wrong row counts, duplicates, bad grouping.
  • Development workflow:
    1. Identify target attributes.
    2. Locate tables in ERD.
    3. Draft SELECT & FROM list.
    4. Add join predicates.
    5. Layer filters (WHERE, GROUP BY, HAVING, DISTINCT).
    6. Test with small sample data; compare to manual totals.
  • Efficiency hints:
    • Do not SELECT *; pull only necessary columns.
    • Minimize correlated subqueries; prefer joins or derived tables.
    • Combine multiple report needs into one query to reuse scans.

Guidelines for Better Query Design & Performance

  • Index usage: Most DBMS engines exploit one index per table per query; equality predicates usually best.
  • Keep optimizer statistics refreshed (e.g., ANALYZE, UPDATE STATISTICS).
  • Match literal data types to column data types to avert conversions.
  • Break extremely complex predicates into several simple queries united by UNION.
  • Avoid self-joins by copying table to temp when feasible; delete temp after use.
  • Build temporary tables for intermediate result reuse across many queries.
  • Avoid DBMS sort without index—export and sort externally if necessary.
  • Use EXPLAIN to study cost plans; record timing baselines.
  • Run heavy ad-hoc reports during off-peak hours.

Views (Dynamic, Materialized)

  • Base table = physically stored.
  • View = virtual SELECT stored in catalog, rebuilt on each reference; uses no storage, always current.
  • Materialized view = result physically stored, refreshed (time-based or event-based); speeds distributed or aggregate queries.
  • Benefits of views:
    • Hide complexity; enforce security by projecting/renaming columns.
    • Provide backward-compatible logical schemas (physical independence).
    • Improve programmer productivity.
  • CREATE VIEW example – invoice projection:
  CREATE VIEW Invoice_V AS
  SELECT C.CustomerID, CustomerAddress,
         O.OrderID, P.ProductID,
         ProductStandardPrice, OrderedQuantity
  FROM   Customer_T C, Order_T O, OrderLine_T L, Product_T P
  WHERE  C.CustomerID = O.CustomerID
    AND  O.OrderID    = L.OrderID
    AND  P.ProductID  = L.ProductID;
  • Querying the view same as table; can layer further views (e.g., OrderTotals_V).
  • WITH CHECK OPTION prevents updates through the view that would violate its WHERE condition.
  • System tables (Oracle DBA_VIEWS, SQL Server sys.views, Information_Schema views) store definition text.

Triggers 101

  • Stored procedural objects that fire automatically on DB events (DML or DDL).
    • Event + Condition + Action pattern.
  • Use-cases: enforce referential / business rules, audit trails, replicate data, custom alerts.
  • BEFORE vs. AFTER vs. INSTEAD OF timing.
  • Row-level vs. statement-level (Oracle allows both; SQL Server only statement-level).
  • DDL trigger example (prevents accidental schema change):
  CREATE TRIGGER safety
  ON DATABASE
  FOR DROP_TABLE, ALTER_TABLE
  AS
  PRINT 'Disable "safety" trigger first.';
  ROLLBACK;
  • Caution: cascading chains, infinite loops, silent privilege escalations (security holes).
  • Constraints are faster & simpler; prefer them when rule fits standard uniqueness / FK / CHECK semantics.

Routines & SQL/PSM (Persistent Stored Modules)

  • Procedure: callable module; may have IN, OUT, INOUT params; returns 0–many result sets.
  • Function: returns single scalar/aggregate; only IN params; can appear in SELECT list.
  • Flow control constructs: CASE, IF-THEN/ELSEIF, LOOP/WHILE/REPEAT, FOR-each-row, LEAVE, ITERATE.
  • Advantages: shared business logic, server-side execution, reduced client traffic.
  • Vendor dialects:
    • Oracle – PL/SQL
    • SQL Server – T-SQL
    • MySQL / MariaDB – stored routines
    • PostgreSQL – PL/pgSQL, PL/v8, etc.
  • Example (Oracle PL/SQL) – mark sale price:
  ALTER TABLE Product_T ADD (SalePrice DECIMAL(6,2));
  CREATE OR REPLACE PROCEDURE ProductLineSale AS
  BEGIN
      UPDATE Product_T
      SET    SalePrice = 0.90 * ProductStandardPrice
      WHERE  ProductStandardPrice >= 400;
      UPDATE Product_T
      SET    SalePrice = 0.85 * ProductStandardPrice
      WHERE  ProductStandardPrice < 400;
  END;
  EXEC ProductLineSale;
  • Demonstrates set-oriented bulk UPDATE vs. row-oriented host languages.

Data Dictionary & Metadata Queries

  • Every RDBMS stores schema info in system tables (catalog), accessible via SELECT.
    • Oracle 12c: DBA_TABLES, DBA_TAB_COLUMNS, DBA_CONSTRAINTS, DBA_USERS, V$ performance views, USER/ALL/DBA prefix views.
    • SQL Server 2016: sys.objects, sys.columns, dynamic management views (DMVs), information_schema, compatibility views.
  • Example (Oracle):
  SELECT OWNER, TABLE_NAME
  FROM   DBA_TABLES
  WHERE  TABLE_NAME = 'PRODUCT_T';
  • Example (SQL Server): objects modified in last 10 days:
  SELECT name, SCHEMA_NAME(schema_id) AS schema_name,
         type_desc, create_date, modify_date
  FROM   sys.objects
  WHERE  modify_date > GETDATE() - 10;

SQL Standard Evolution Highlights

  • SQL:1999 – added triggers, PSM, UDTs, flow control.
  • SQL:2008 – standardized OLAP/statistical functions, new data types, preliminary SQL/XML.
  • SQL:2011 – major temporal support: application-time, system-time, bi-temporal tables; predicates: CONTAINS, OVERLAPS, PRECEDES, SUCCEEDS, IMMEDIATELY … .
    • Application-time table sample:
      sql CREATE TABLE Employee_T( EmpNbr NUMBER(11,0), EmpStart DATE, EmpEnd DATE, EmpDept NUMBER(11,0), PERIOD FOR EmpPeriod (EmpStart, EmpEnd), PRIMARY KEY (EmpNbr, EmpStart, EmpEnd) );
  • SQL:2016 – enhancements:
    • Row Pattern Recognition (MATCH_RECOGNIZE) for time-series.
    • Trig/log functions, broader JSON support.
    • Polymorphic Table Functions (PTFs) – table-valued functions whose output row-type is determined at runtime.

Other Noteworthy Additions

  • MERGE (SQL:2008) – UPSERT in single statement.
  MERGE INTO Customer_T AS Cust
  USING (SELECT … FROM CustTrans_T) AS CT
  ON (Cust.CustomerID = CT.CustomerID)
  WHEN MATCHED THEN UPDATE …
  WHEN NOT MATCHED THEN INSERT …;
  • CREATE TABLE LIKE … INCLUDING COLUMN DEFAULTS – copy structure fast.

Analytical & OLAP Functions

  • Part of SQL:2008 – built-in linear regression, correlation, moving averages, window aggregates.
    • Calculations stay inside DB ⇒ minimizes ETL to analytical tools.
    • Vendors converging on common syntax.

Quick Reference Cheat-Sheet

  • Equi-join duplicates cols; Natural join drops them.
  • LEFT / RIGHT / FULL OUTER keep unmatched rows with NULL placeholders.
  • CROSS JOIN = Cartesian product.
  • Subquery returns scalar / list / EXISTS boolean; correlated ⇒ inner depends on outer row.
  • Derived table = subquery in FROM, can include aggregates.
  • UNION / INTERSECT / MINUS combine sets; require column compatibility.
  • CASE / NULLIF / COALESCE enable conditional or default logic.
  • Views virtualize queries; materialized views cache them.
  • Trigger = auto-fired; Routine = called; both written in SQL/PSM or vendor PL.
  • SQL standards continually add analytics, temporal, JSON, pattern matching, PTF.
  • Performance basics: match data types, refresh stats, fetch only needed columns, watch for Cartesian joins & correlated subqueries.