Comprehensive SQL Lecture Notes
SQL OVERVIEW
- Structured Query Language (SQL) = standard language for Relational DataBase Management Systems (RDBMSs)
- Used to create, manipulate, and control relational databases
- Runs on mainframes, servers, PCs, mobile and cloud platforms
- May be invisible to end-users (hidden behind GUIs, forms, or QBE interfaces)
- Historical milestones
- 1970 : E. F. Codd proposes the relational model
- 1974{-}1979 : IBM System R proves relational feasibility ➜ internal query language “Sequel” ⇒ SQL
- 1979 : Oracle ships the first commercial DBMS supporting SQL
- 1981 : IBM SQL/DS released
- 1986 : ANSI / ISO ratify first standard – SQL/86 (a.k.a. SQL-1)
- Later revisions: SQL/89, SQL/92, SQL:1999, SQL:2003, SQL:2008, SQL:2011, SQL:2016
- Purposes
- Define common syntax & semantics
- Specify minimum/complete feature sets for portability
- Provide basis for enhancements
- Advantages
- Lower training costs (one language)
- Developer productivity & skill longevity
- Application portability/longevity
- Reduced vendor lock-in ➜ healthy competition & lower TCO
- Cross-system communication (heterogeneous DB links)
- Disadvantages / criticisms
- May stifle innovation; hard to change (vested vendor interests)
- Vendors add proprietary extensions ➜ portability suffers
- Early versions lacked referential integrity & several relational operators
MARKET LANDSCAPE & EMERGING TECHNOLOGIES
- Mature relational market; dominant products (mid-2020): Oracle, MySQL, Microsoft SQL Server
- Key open-source RDBMSs: MySQL, PostgreSQL, MariaDB
- NoSQL & Hadoop ecosystems rising, but many still expose SQL or SQL-like layers (Hive QL, Spark SQL)
THE SQL ENVIRONMENT
- Components of an SQL-based application
- User interface (form, report, GUI, web page…)
- Database tables (base tables)
- RDBMS executing SQL engine
- Catalog hierarchy
- Catalog ⇒ Collection of schemas ⇒ Objects (tables, views, indexes, constraints…)
- Information Schema = meta-catalog of all objects (tables, attributes, privileges…)
- SQL command families
- DDL (Data Definition) – CREATE, ALTER, DROP
- DML (Data Manipulation) – SELECT, INSERT, UPDATE, DELETE, MERGE
- DCL (Data Control) – GRANT, REVOKE; plus transaction control (COMMIT, ROLLBACK)
SQL DATA TYPES
- Numeric: INTEGER, SMALLINT, BIGINT, NUMERIC(p,s), DECIMAL(p,s), DECFLOAT
- Character
- Fixed length: CHARACTER(n) / CHAR(n)
- Variable: CHARACTER VARYING(n) / VARCHAR(n) (Oracle VARCHAR2)
- Binary Large Objects: BLOB, CLOB, NCLOB, BFILE
- Temporal: DATE, TIME, TIMESTAMP, TIMESTAMP WITH LOCAL TIME ZONE
- Boolean: BOOLEAN (TRUE/FALSE/UNKNOWN)
- Advanced (SQL:1999-2016): XML, MULTISET, JSON (vendor), spatial, graphic, image
- Good practice
- Choose data type by intended operations (math ⇒ numeric, searching ⇒ proper length, date arithmetic ⇒ temporal)
- Use domains & CHECK constraints for semantic validation
DEFINING DATABASE STRUCTURES (DDL)
- Creating a schema:
CREATE SCHEMA schema_name AUTHORIZATION owner_userid; - Command summary
CREATE TABLE – base or derived (from AS SELECT)CREATE VIEW – logical table; may be updateable with restrictionsCREATE INDEX – performance aid (primary / secondary, single / composite)CREATE DOMAIN, CREATE CHARACTER SET, CREATE COLLATION, CREATE TRANSLATION, CREATE ASSERTION
- Reversing objects –
DROP TABLE | VIEW | INDEX … [RESTRICT | CASCADE] - Modifying structures –
ALTER TABLE (ADD column, ALTER column, DROP column, ADD constraint)
TABLE DESIGN & CONSTRAINTS
- Column definition clause:
column_name data_type [NOT NULL] [DEFAULT expr] [constraint] - Key constraints
PRIMARY KEY (col [, …]) – unique + not nullUNIQUE (col [, …]) – alternate candidate keysFOREIGN KEY … REFERENCES parent(col) – referential integrity
- Referential integrity actions
- ON UPDATE / ON DELETE { RESTRICT | CASCADE | SET NULL | SET DEFAULT }
- Additional constraints
CHECK (predicate) – enforce domain/business rule- Named constraints strongly recommended:
CONSTRAINT Order_PK PRIMARY KEY (OrderID)
- Identity / auto-incrementing primary keys (SQL:2008):
CustomerID INTEGER GENERATED ALWAYS AS IDENTITY
(START WITH 1 INCREMENT BY 1)
- Copy structure:
CREATE TABLE New_T LIKE Old_T; (SQL:2008 allows copying constraints) - Dual table (Oracle/MySQL): dummy one-row table for selecting expressions (
SELECT SYSDATE FROM DUAL;)
CHANGING & REMOVING TABLES
ALTER TABLE … usage examples- Add column:
ALTER TABLE Customer_T ADD CustomerType VARCHAR2(10) DEFAULT 'Commercial'; - Drop column:
ALTER TABLE … DROP COLUMN colname; - Add / drop constraints as needed (often disabled temporarily for bulk loads)
- Removing data structures
DROP TABLE TableName; (plus RESTRICT / CASCADE)TRUNCATE TABLE – delete all rows but keep structure (non-logged, fast)
INSERTING, UPDATING & DELETING DATA (DML)
- INSERT
- Full-row:
INSERT INTO Customer_T VALUES (001,'Contemporary Casuals',…); - Partial-row:
INSERT INTO Product_T (ProductID, ProductDescription, …) VALUES (1,'End Table',…); - Copy rows:
INSERT INTO CaCustomer_T SELECT * FROM Customer_T WHERE CustomerState = 'CA';
- Batch loaders
- Oracle
SQL*Loader, SQL Server BULK INSERT, MySQL LOAD DATA INFILE
- DELETE
- Conditional:
DELETE FROM Customer_T WHERE CustomerState = 'HI'; - All rows:
DELETE FROM Customer_T; (or TRUNCATE TABLE) - Always test with
SELECT … first to preview
- UPDATE
- Example:
UPDATE Product_T SET ProductStandardPrice = 775 WHERE ProductID = 7; - Set to NULL:
SET column = NULL - MERGE (UPSERT) SQL:2008 syntax combines INSERT & UPDATE (see transcript example)
- Indexing guidelines
- Create indexes on primary & frequently-searched secondary keys ➜ faster
SELECT, slower INSERT/UPDATE - Types: unique / non-unique / full-text / spatial / hash (MySQL), ascending/descending
- Oracle syntax:
CREATE INDEX Name_IDX ON Customer_T (CustomerName); - Drop when bulk-loading or if seldom used:
DROP INDEX Name_IDX;
- Physical file organisation
- Choose clustering, partitioning, sorted files to match workload
- Allocate extra space for growth; maintain statistics for optimiser
QUERYING SINGLE TABLES – SELECT STATEMENT
- Basic clause order to write:
SELECT … FROM … WHERE … ORDER BY …; - Actual processing order: \text{FROM} \rightarrow \text{WHERE} \rightarrow \text{GROUP BY} \rightarrow \text{HAVING} \rightarrow \text{SELECT} \rightarrow \text{ORDER BY}
- Key clauses
SELECT list – columns / expressions / DISTINCT / *FROM – base or derived tables, viewsWHERE – row selection joins, predicates, subqueriesORDER BY – final sort (ASC default / DESC)GROUP BY – form groups for aggregates; columns in SELECT must appear in GROUP BY unless aggregatedHAVING – filter groups (aggregate predicates)
- Aliases improve readability:
SELECT CUST.CustomerName AS Name … FROM Customer_T CUST …
EXPRESSIONS & FUNCTIONS
- Operators: +, –, *, /, % (modulo)
- Example: price plus 10 %:
ProductStandardPrice*1.1 AS Plus10Percent - Precedence: Parentheses → * / → + -
- Scalar functions
- Math:
ROUND, TRUNC, MOD, POWER, SQRT, LN, EXP, FLOOR, CEILING - Aggregate:
COUNT, SUM, AVG, MIN, MAX - String:
LOWER, UPPER, INITCAP, CONCAT, SUBSTR, COALESCE - Date:
NEXT_DAY, ADD_MONTHS, MONTHS_BETWEEN - Analytical (vendor):
TOP, WIDTH_BUCKET, window functions (not detailed here)
COMPARISON, LOGIC & PATTERN MATCHING
- Comparison operators: =, <>, !=, >, >=, <, <=
- Wildcards with
LIKE% any string; _ single charLIKE '%Desk' ➜ ends with ‘Desk’LIKE '_-drawer' ➜ pattern such as “3-drawer”
- Boolean operators
NOT evaluated first ➜ AND ➜ OR (use parentheses to override)
- Ranges & lists
BETWEEN low AND high / NOT BETWEENIN (… list …) / subquery lists; NOT IN
DISTINCT removes duplicate rows in final projection (only once per query)- NULL predicates:
IS NULL, IS NOT NULL (NULL ≠ zero or blank)
ORDER BY column1 [, column2] [DESC] (position numbers allowed: ORDER BY 3,1)- LIMIT / OFFSET clauses (MySQL) or
FETCH FIRST n ROWS ONLY (Oracle 12c):
ORDER BY 3,1 LIMIT 30,5; -- skip 30, fetch 5
- NULL sort order implementation-dependent (some allow
NULLS FIRST|LAST)
GROUPING & AGGREGATION
- Scalar aggregate = single result, vector aggregate = multiple per group
- Rule: every SELECT column must appear in GROUP BY unless it is an aggregate argument
HAVING filters groups using aggregates- Example workflow
- FROM – build working set
- WHERE – row filter
- GROUP BY – create groups
- HAVING – group filter
- SELECT – projection & calculations
- ORDER BY – final sort
- Debug tip: build query progressively in processing order, check intermediate results
TYPICAL EXAMPLES IN CONTEXT
SELECT ProductDescription, ProductStandardPrice
FROM Product_T
WHERE ProductStandardPrice < 275;
- Count order lines for Order 1004:
SELECT COUNT(*) FROM OrderLine_T WHERE OrderID = 1004; - Customers per state, only states with >1 customer:
SELECT CustomerState, COUNT(*)
FROM Customer_T
GROUP BY CustomerState
HAVING COUNT(*) > 1;
ETHICAL / PRACTICAL IMPLICATIONS & BEST PRACTICES
- Referential integrity prevents orphaned data → quality & trust in analytics
- Proper typing & constraints reduce application-level validation code
- Over-indexing harms write performance; balance speed vs. cost
- Standardisation fosters skill portability but watch proprietary features
- Always test DML on small sets or within transactions; rollback if unexpected
- Provide meaningful names for objects & constraints to ease maintenance
RECAP & KEY TAKEAWAYS
- SQL serves three roles: DDL, DML, DCL
- Standard evolution continues; SQL:2016 adds JSON, temporal tables, row pattern matching
- Catalog → Schema → Objects hierarchy central to metadata management
- Designing tables involves choosing right data types, primary/foreign keys, constraints & indexes
- CRUD operations (
INSERT, UPDATE, DELETE) obey referential constraints; MERGE simplifies upserts - SELECT is declarative & set-oriented; mastering WHERE, GROUP BY, HAVING, ORDER BY, functions & subqueries enables powerful analytics
- Performance tuning rests on indexing, physical design, statistics, and query rewriting
- Always consider null semantics, Boolean logic precedence, and standard vs. vendor-specific syntax during development