AM

Comprehensive SQL Lecture Notes

SQL OVERVIEW

  • Structured Query Language (SQL) = standard language for Relational DataBase Management Systems (RDBMSs)
    • Used to create, manipulate, and control relational databases
    • Runs on mainframes, servers, PCs, mobile and cloud platforms
    • May be invisible to end-users (hidden behind GUIs, forms, or QBE interfaces)
  • Historical milestones
    • 1970 : E. F. Codd proposes the relational model
    • 1974{-}1979 : IBM System R proves relational feasibility ➜ internal query language “Sequel” ⇒ SQL
    • 1979 : Oracle ships the first commercial DBMS supporting SQL
    • 1981 : IBM SQL/DS released
    • 1986 : ANSI / ISO ratify first standard – SQL/86 (a.k.a. SQL-1)
    • Later revisions: SQL/89, SQL/92, SQL:1999, SQL:2003, SQL:2008, SQL:2011, SQL:2016

MOTIVATION FOR A FORMAL STANDARD

  • Purposes
    • Define common syntax & semantics
    • Specify minimum/complete feature sets for portability
    • Provide basis for enhancements
  • Advantages
    • Lower training costs (one language)
    • Developer productivity & skill longevity
    • Application portability/longevity
    • Reduced vendor lock-in ➜ healthy competition & lower TCO
    • Cross-system communication (heterogeneous DB links)
  • Disadvantages / criticisms
    • May stifle innovation; hard to change (vested vendor interests)
    • Vendors add proprietary extensions ➜ portability suffers
    • Early versions lacked referential integrity & several relational operators

MARKET LANDSCAPE & EMERGING TECHNOLOGIES

  • Mature relational market; dominant products (mid-2020): Oracle, MySQL, Microsoft SQL Server
  • Key open-source RDBMSs: MySQL, PostgreSQL, MariaDB
  • NoSQL & Hadoop ecosystems rising, but many still expose SQL or SQL-like layers (Hive QL, Spark SQL)

THE SQL ENVIRONMENT

  • Components of an SQL-based application
    • User interface (form, report, GUI, web page…)
    • Database tables (base tables)
    • RDBMS executing SQL engine
  • Catalog hierarchy
    • Catalog ⇒ Collection of schemas ⇒ Objects (tables, views, indexes, constraints…)
    • Information Schema = meta-catalog of all objects (tables, attributes, privileges…)
  • SQL command families
    • DDL (Data Definition) – CREATE, ALTER, DROP
    • DML (Data Manipulation) – SELECT, INSERT, UPDATE, DELETE, MERGE
    • DCL (Data Control) – GRANT, REVOKE; plus transaction control (COMMIT, ROLLBACK)

SQL DATA TYPES

  • Numeric: INTEGER, SMALLINT, BIGINT, NUMERIC(p,s), DECIMAL(p,s), DECFLOAT
  • Character
    • Fixed length: CHARACTER(n) / CHAR(n)
    • Variable: CHARACTER VARYING(n) / VARCHAR(n) (Oracle VARCHAR2)
  • Binary Large Objects: BLOB, CLOB, NCLOB, BFILE
  • Temporal: DATE, TIME, TIMESTAMP, TIMESTAMP WITH LOCAL TIME ZONE
  • Boolean: BOOLEAN (TRUE/FALSE/UNKNOWN)
  • Advanced (SQL:1999-2016): XML, MULTISET, JSON (vendor), spatial, graphic, image
  • Good practice
    • Choose data type by intended operations (math ⇒ numeric, searching ⇒ proper length, date arithmetic ⇒ temporal)
    • Use domains & CHECK constraints for semantic validation

DEFINING DATABASE STRUCTURES (DDL)

  • Creating a schema: CREATE SCHEMA schema_name AUTHORIZATION owner_userid;
  • Command summary
    • CREATE TABLE – base or derived (from AS SELECT)
    • CREATE VIEW – logical table; may be updateable with restrictions
    • CREATE INDEX – performance aid (primary / secondary, single / composite)
    • CREATE DOMAIN, CREATE CHARACTER SET, CREATE COLLATION, CREATE TRANSLATION, CREATE ASSERTION
  • Reversing objects – DROP TABLE | VIEW | INDEX … [RESTRICT | CASCADE]
  • Modifying structures – ALTER TABLE (ADD column, ALTER column, DROP column, ADD constraint)

TABLE DESIGN & CONSTRAINTS

  • Column definition clause: column_name data_type [NOT NULL] [DEFAULT expr] [constraint]
  • Key constraints
    • PRIMARY KEY (col [, …]) – unique + not null
    • UNIQUE (col [, …]) – alternate candidate keys
    • FOREIGN KEY … REFERENCES parent(col) – referential integrity
  • Referential integrity actions
    • ON UPDATE / ON DELETE { RESTRICT | CASCADE | SET NULL | SET DEFAULT }
  • Additional constraints
    • CHECK (predicate) – enforce domain/business rule
    • Named constraints strongly recommended: CONSTRAINT Order_PK PRIMARY KEY (OrderID)
  • Identity / auto-incrementing primary keys (SQL:2008):
  CustomerID INTEGER GENERATED ALWAYS AS IDENTITY
      (START WITH 1 INCREMENT BY 1)
  • Copy structure: CREATE TABLE New_T LIKE Old_T; (SQL:2008 allows copying constraints)
  • Dual table (Oracle/MySQL): dummy one-row table for selecting expressions (SELECT SYSDATE FROM DUAL;)

CHANGING & REMOVING TABLES

  • ALTER TABLE … usage examples
    • Add column: ALTER TABLE Customer_T ADD CustomerType VARCHAR2(10) DEFAULT 'Commercial';
    • Drop column: ALTER TABLE … DROP COLUMN colname;
    • Add / drop constraints as needed (often disabled temporarily for bulk loads)
  • Removing data structures
    • DROP TABLE TableName; (plus RESTRICT / CASCADE)
    • TRUNCATE TABLE – delete all rows but keep structure (non-logged, fast)

INSERTING, UPDATING & DELETING DATA (DML)

  • INSERT
    • Full-row: INSERT INTO Customer_T VALUES (001,'Contemporary Casuals',…);
    • Partial-row: INSERT INTO Product_T (ProductID, ProductDescription, …) VALUES (1,'End Table',…);
    • Copy rows: INSERT INTO CaCustomer_T SELECT * FROM Customer_T WHERE CustomerState = 'CA';
  • Batch loaders
    • Oracle SQL*Loader, SQL Server BULK INSERT, MySQL LOAD DATA INFILE
  • DELETE
    • Conditional: DELETE FROM Customer_T WHERE CustomerState = 'HI';
    • All rows: DELETE FROM Customer_T; (or TRUNCATE TABLE)
    • Always test with SELECT … first to preview
  • UPDATE
    • Example: UPDATE Product_T SET ProductStandardPrice = 775 WHERE ProductID = 7;
    • Set to NULL: SET column = NULL
    • MERGE (UPSERT) SQL:2008 syntax combines INSERT & UPDATE (see transcript example)

INTERNAL SCHEMA & PERFORMANCE OPTIONS

  • Indexing guidelines
    • Create indexes on primary & frequently-searched secondary keys ➜ faster SELECT, slower INSERT/UPDATE
    • Types: unique / non-unique / full-text / spatial / hash (MySQL), ascending/descending
    • Oracle syntax: CREATE INDEX Name_IDX ON Customer_T (CustomerName);
    • Drop when bulk-loading or if seldom used: DROP INDEX Name_IDX;
  • Physical file organisation
    • Choose clustering, partitioning, sorted files to match workload
    • Allocate extra space for growth; maintain statistics for optimiser

QUERYING SINGLE TABLES – SELECT STATEMENT

  • Basic clause order to write: SELECT … FROM … WHERE … ORDER BY …;
  • Actual processing order: \text{FROM} \rightarrow \text{WHERE} \rightarrow \text{GROUP BY} \rightarrow \text{HAVING} \rightarrow \text{SELECT} \rightarrow \text{ORDER BY}
  • Key clauses
    • SELECT list – columns / expressions / DISTINCT / *
    • FROM – base or derived tables, views
    • WHERE – row selection joins, predicates, subqueries
    • ORDER BY – final sort (ASC default / DESC)
    • GROUP BY – form groups for aggregates; columns in SELECT must appear in GROUP BY unless aggregated
    • HAVING – filter groups (aggregate predicates)
  • Aliases improve readability: SELECT CUST.CustomerName AS Name … FROM Customer_T CUST …

EXPRESSIONS & FUNCTIONS

  • Operators: +, –, *, /, % (modulo)
  • Example: price plus 10 %: ProductStandardPrice*1.1 AS Plus10Percent
  • Precedence: Parentheses → * / → + -
  • Scalar functions
    • Math: ROUND, TRUNC, MOD, POWER, SQRT, LN, EXP, FLOOR, CEILING
    • Aggregate: COUNT, SUM, AVG, MIN, MAX
    • String: LOWER, UPPER, INITCAP, CONCAT, SUBSTR, COALESCE
    • Date: NEXT_DAY, ADD_MONTHS, MONTHS_BETWEEN
    • Analytical (vendor): TOP, WIDTH_BUCKET, window functions (not detailed here)

COMPARISON, LOGIC & PATTERN MATCHING

  • Comparison operators: =, <>, !=, >, >=, <, <=
  • Wildcards with LIKE
    • % any string; _ single char
    • LIKE '%Desk' ➜ ends with ‘Desk’
    • LIKE '_-drawer' ➜ pattern such as “3-drawer”
  • Boolean operators
    • NOT evaluated first ➜ ANDOR (use parentheses to override)
  • Ranges & lists
    • BETWEEN low AND high / NOT BETWEEN
    • IN (… list …) / subquery lists; NOT IN
  • DISTINCT removes duplicate rows in final projection (only once per query)
  • NULL predicates: IS NULL, IS NOT NULL (NULL ≠ zero or blank)

SORTING & PAGINATION

  • ORDER BY column1 [, column2] [DESC] (position numbers allowed: ORDER BY 3,1)
  • LIMIT / OFFSET clauses (MySQL) or FETCH FIRST n ROWS ONLY (Oracle 12c):
  ORDER BY 3,1 LIMIT 30,5; -- skip 30, fetch 5
  • NULL sort order implementation-dependent (some allow NULLS FIRST|LAST)

GROUPING & AGGREGATION

  • Scalar aggregate = single result, vector aggregate = multiple per group
  • Rule: every SELECT column must appear in GROUP BY unless it is an aggregate argument
  • HAVING filters groups using aggregates
  • Example workflow
    1. FROM – build working set
    2. WHERE – row filter
    3. GROUP BY – create groups
    4. HAVING – group filter
    5. SELECT – projection & calculations
    6. ORDER BY – final sort
  • Debug tip: build query progressively in processing order, check intermediate results

TYPICAL EXAMPLES IN CONTEXT

  • Find products costing
  SELECT ProductDescription, ProductStandardPrice
  FROM   Product_T
  WHERE  ProductStandardPrice < 275;
  • Count order lines for Order 1004: SELECT COUNT(*) FROM OrderLine_T WHERE OrderID = 1004;
  • Customers per state, only states with >1 customer:
  SELECT CustomerState, COUNT(*)
  FROM   Customer_T
  GROUP  BY CustomerState
  HAVING COUNT(*) > 1;

ETHICAL / PRACTICAL IMPLICATIONS & BEST PRACTICES

  • Referential integrity prevents orphaned data → quality & trust in analytics
  • Proper typing & constraints reduce application-level validation code
  • Over-indexing harms write performance; balance speed vs. cost
  • Standardisation fosters skill portability but watch proprietary features
  • Always test DML on small sets or within transactions; rollback if unexpected
  • Provide meaningful names for objects & constraints to ease maintenance

RECAP & KEY TAKEAWAYS

  • SQL serves three roles: DDL, DML, DCL
  • Standard evolution continues; SQL:2016 adds JSON, temporal tables, row pattern matching
  • Catalog → Schema → Objects hierarchy central to metadata management
  • Designing tables involves choosing right data types, primary/foreign keys, constraints & indexes
  • CRUD operations (INSERT, UPDATE, DELETE) obey referential constraints; MERGE simplifies upserts
  • SELECT is declarative & set-oriented; mastering WHERE, GROUP BY, HAVING, ORDER BY, functions & subqueries enables powerful analytics
  • Performance tuning rests on indexing, physical design, statistics, and query rewriting
  • Always consider null semantics, Boolean logic precedence, and standard vs. vendor-specific syntax during development