SQL OVERVIEW
- Structured Query Language (SQL) = standard language for Relational DataBase Management Systems (RDBMSs)
- Used to create, manipulate, and control relational databases
- Runs on mainframes, servers, PCs, mobile and cloud platforms
- May be invisible to end-users (hidden behind GUIs, forms, or QBE interfaces)
- Historical milestones
- 1970 : E. F. Codd proposes the relational model
- 1974{-}1979 : IBM System R proves relational feasibility ➜ internal query language “Sequel” ⇒ SQL
- 1979 : Oracle ships the first commercial DBMS supporting SQL
- 1981 : IBM SQL/DS released
- 1986 : ANSI / ISO ratify first standard – SQL/86 (a.k.a. SQL-1)
- Later revisions: SQL/89, SQL/92, SQL:1999, SQL:2003, SQL:2008, SQL:2011, SQL:2016
- Purposes
- Define common syntax & semantics
- Specify minimum/complete feature sets for portability
- Provide basis for enhancements
- Advantages
- Lower training costs (one language)
- Developer productivity & skill longevity
- Application portability/longevity
- Reduced vendor lock-in ➜ healthy competition & lower TCO
- Cross-system communication (heterogeneous DB links)
- Disadvantages / criticisms
- May stifle innovation; hard to change (vested vendor interests)
- Vendors add proprietary extensions ➜ portability suffers
- Early versions lacked referential integrity & several relational operators
MARKET LANDSCAPE & EMERGING TECHNOLOGIES
- Mature relational market; dominant products (mid-2020): Oracle, MySQL, Microsoft SQL Server
- Key open-source RDBMSs: MySQL, PostgreSQL, MariaDB
- NoSQL & Hadoop ecosystems rising, but many still expose SQL or SQL-like layers (Hive QL, Spark SQL)
THE SQL ENVIRONMENT
- Components of an SQL-based application
- User interface (form, report, GUI, web page…)
- Database tables (base tables)
- RDBMS executing SQL engine
- Catalog hierarchy
- Catalog ⇒ Collection of schemas ⇒ Objects (tables, views, indexes, constraints…)
- Information Schema = meta-catalog of all objects (tables, attributes, privileges…)
- SQL command families
- DDL (Data Definition) – CREATE, ALTER, DROP
- DML (Data Manipulation) – SELECT, INSERT, UPDATE, DELETE, MERGE
- DCL (Data Control) – GRANT, REVOKE; plus transaction control (COMMIT, ROLLBACK)
SQL DATA TYPES
- Numeric: INTEGER, SMALLINT, BIGINT, NUMERIC(p,s), DECIMAL(p,s), DECFLOAT
- Character
- Fixed length: CHARACTER(n) / CHAR(n)
- Variable: CHARACTER VARYING(n) / VARCHAR(n) (Oracle VARCHAR2)
- Binary Large Objects: BLOB, CLOB, NCLOB, BFILE
- Temporal: DATE, TIME, TIMESTAMP, TIMESTAMP WITH LOCAL TIME ZONE
- Boolean: BOOLEAN (TRUE/FALSE/UNKNOWN)
- Advanced (SQL:1999-2016): XML, MULTISET, JSON (vendor), spatial, graphic, image
- Good practice
- Choose data type by intended operations (math ⇒ numeric, searching ⇒ proper length, date arithmetic ⇒ temporal)
- Use domains & CHECK constraints for semantic validation
DEFINING DATABASE STRUCTURES (DDL)
- Creating a schema:
CREATE SCHEMA schema_name AUTHORIZATION owner_userid;
- Command summary
CREATE TABLE
– base or derived (from AS SELECT
)CREATE VIEW
– logical table; may be updateable with restrictionsCREATE INDEX
– performance aid (primary / secondary, single / composite)CREATE DOMAIN
, CREATE CHARACTER SET
, CREATE COLLATION
, CREATE TRANSLATION
, CREATE ASSERTION
- Reversing objects –
DROP TABLE | VIEW | INDEX … [RESTRICT | CASCADE]
- Modifying structures –
ALTER TABLE
(ADD column, ALTER column, DROP column, ADD constraint)
TABLE DESIGN & CONSTRAINTS
- Column definition clause:
column_name data_type [NOT NULL] [DEFAULT expr] [constraint]
- Key constraints
PRIMARY KEY (col [, …])
– unique + not nullUNIQUE (col [, …])
– alternate candidate keysFOREIGN KEY … REFERENCES parent(col)
– referential integrity
- Referential integrity actions
- ON UPDATE / ON DELETE { RESTRICT | CASCADE | SET NULL | SET DEFAULT }
- Additional constraints
CHECK (predicate)
– enforce domain/business rule- Named constraints strongly recommended:
CONSTRAINT Order_PK PRIMARY KEY (OrderID)
- Identity / auto-incrementing primary keys (SQL:2008):
CustomerID INTEGER GENERATED ALWAYS AS IDENTITY
(START WITH 1 INCREMENT BY 1)
- Copy structure:
CREATE TABLE New_T LIKE Old_T;
(SQL:2008 allows copying constraints) - Dual table (Oracle/MySQL): dummy one-row table for selecting expressions (
SELECT SYSDATE FROM DUAL;
)
CHANGING & REMOVING TABLES
ALTER TABLE …
usage examples- Add column:
ALTER TABLE Customer_T ADD CustomerType VARCHAR2(10) DEFAULT 'Commercial';
- Drop column:
ALTER TABLE … DROP COLUMN colname;
- Add / drop constraints as needed (often disabled temporarily for bulk loads)
- Removing data structures
DROP TABLE TableName;
(plus RESTRICT / CASCADE)TRUNCATE TABLE
– delete all rows but keep structure (non-logged, fast)
INSERTING, UPDATING & DELETING DATA (DML)
- INSERT
- Full-row:
INSERT INTO Customer_T VALUES (001,'Contemporary Casuals',…);
- Partial-row:
INSERT INTO Product_T (ProductID, ProductDescription, …) VALUES (1,'End Table',…);
- Copy rows:
INSERT INTO CaCustomer_T SELECT * FROM Customer_T WHERE CustomerState = 'CA';
- Batch loaders
- Oracle
SQL*Loader
, SQL Server BULK INSERT
, MySQL LOAD DATA INFILE
- DELETE
- Conditional:
DELETE FROM Customer_T WHERE CustomerState = 'HI';
- All rows:
DELETE FROM Customer_T;
(or TRUNCATE TABLE
) - Always test with
SELECT …
first to preview
- UPDATE
- Example:
UPDATE Product_T SET ProductStandardPrice = 775 WHERE ProductID = 7;
- Set to NULL:
SET column = NULL
- MERGE (UPSERT) SQL:2008 syntax combines INSERT & UPDATE (see transcript example)
- Indexing guidelines
- Create indexes on primary & frequently-searched secondary keys ➜ faster
SELECT
, slower INSERT
/UPDATE
- Types: unique / non-unique / full-text / spatial / hash (MySQL), ascending/descending
- Oracle syntax:
CREATE INDEX Name_IDX ON Customer_T (CustomerName);
- Drop when bulk-loading or if seldom used:
DROP INDEX Name_IDX;
- Physical file organisation
- Choose clustering, partitioning, sorted files to match workload
- Allocate extra space for growth; maintain statistics for optimiser
QUERYING SINGLE TABLES – SELECT STATEMENT
- Basic clause order to write:
SELECT … FROM … WHERE … ORDER BY …;
- Actual processing order: \text{FROM} \rightarrow \text{WHERE} \rightarrow \text{GROUP BY} \rightarrow \text{HAVING} \rightarrow \text{SELECT} \rightarrow \text{ORDER BY}
- Key clauses
SELECT
list – columns / expressions / DISTINCT
/ *
FROM
– base or derived tables, viewsWHERE
– row selection joins, predicates, subqueriesORDER BY
– final sort (ASC default / DESC)GROUP BY
– form groups for aggregates; columns in SELECT must appear in GROUP BY unless aggregatedHAVING
– filter groups (aggregate predicates)
- Aliases improve readability:
SELECT CUST.CustomerName AS Name … FROM Customer_T CUST …
EXPRESSIONS & FUNCTIONS
- Operators: +, –, *, /, % (modulo)
- Example: price plus 10 %:
ProductStandardPrice*1.1 AS Plus10Percent
- Precedence: Parentheses → * / → + -
- Scalar functions
- Math:
ROUND
, TRUNC
, MOD
, POWER
, SQRT
, LN
, EXP
, FLOOR
, CEILING
- Aggregate:
COUNT
, SUM
, AVG
, MIN
, MAX
- String:
LOWER
, UPPER
, INITCAP
, CONCAT
, SUBSTR
, COALESCE
- Date:
NEXT_DAY
, ADD_MONTHS
, MONTHS_BETWEEN
- Analytical (vendor):
TOP
, WIDTH_BUCKET
, window functions (not detailed here)
COMPARISON, LOGIC & PATTERN MATCHING
- Comparison operators: =, <>, !=, >, >=, <, <=
- Wildcards with
LIKE
%
any string; _
single charLIKE '%Desk'
➜ ends with ‘Desk’LIKE '_-drawer'
➜ pattern such as “3-drawer”
- Boolean operators
NOT
evaluated first ➜ AND
➜ OR
(use parentheses to override)
- Ranges & lists
BETWEEN low AND high
/ NOT BETWEEN
IN (… list …)
/ subquery lists; NOT IN
DISTINCT
removes duplicate rows in final projection (only once per query)- NULL predicates:
IS NULL
, IS NOT NULL
(NULL ≠ zero or blank)
ORDER BY column1 [, column2] [DESC]
(position numbers allowed: ORDER BY 3,1
)- LIMIT / OFFSET clauses (MySQL) or
FETCH FIRST n ROWS ONLY
(Oracle 12c):
ORDER BY 3,1 LIMIT 30,5; -- skip 30, fetch 5
- NULL sort order implementation-dependent (some allow
NULLS FIRST|LAST
)
GROUPING & AGGREGATION
- Scalar aggregate = single result, vector aggregate = multiple per group
- Rule: every SELECT column must appear in GROUP BY unless it is an aggregate argument
HAVING
filters groups using aggregates- Example workflow
- FROM – build working set
- WHERE – row filter
- GROUP BY – create groups
- HAVING – group filter
- SELECT – projection & calculations
- ORDER BY – final sort
- Debug tip: build query progressively in processing order, check intermediate results
TYPICAL EXAMPLES IN CONTEXT
SELECT ProductDescription, ProductStandardPrice
FROM Product_T
WHERE ProductStandardPrice < 275;
- Count order lines for Order 1004:
SELECT COUNT(*) FROM OrderLine_T WHERE OrderID = 1004;
- Customers per state, only states with >1 customer:
SELECT CustomerState, COUNT(*)
FROM Customer_T
GROUP BY CustomerState
HAVING COUNT(*) > 1;
ETHICAL / PRACTICAL IMPLICATIONS & BEST PRACTICES
- Referential integrity prevents orphaned data → quality & trust in analytics
- Proper typing & constraints reduce application-level validation code
- Over-indexing harms write performance; balance speed vs. cost
- Standardisation fosters skill portability but watch proprietary features
- Always test DML on small sets or within transactions; rollback if unexpected
- Provide meaningful names for objects & constraints to ease maintenance
RECAP & KEY TAKEAWAYS
- SQL serves three roles: DDL, DML, DCL
- Standard evolution continues; SQL:2016 adds JSON, temporal tables, row pattern matching
- Catalog → Schema → Objects hierarchy central to metadata management
- Designing tables involves choosing right data types, primary/foreign keys, constraints & indexes
- CRUD operations (
INSERT
, UPDATE
, DELETE
) obey referential constraints; MERGE
simplifies upserts - SELECT is declarative & set-oriented; mastering WHERE, GROUP BY, HAVING, ORDER BY, functions & subqueries enables powerful analytics
- Performance tuning rests on indexing, physical design, statistics, and query rewriting
- Always consider null semantics, Boolean logic precedence, and standard vs. vendor-specific syntax during development