SQL Notes

Why Study Databases?

  • Financial Reasons:
    • Startups require database expertise.
    • There's a massive industry demand for database professionals, leading to higher earning potential ($).
  • Intellectual Reasons (Research):
    • Science is transitioning from data-poor to data-rich environments.
    • There's a challenge in handling the large volumes of data.
    • Database concepts are fundamental and integrated with various computer science fields.
      • Systems, theory, AI, logic, statistics, and analysis.
    • Many significant computer system ideas originated in the database field.

Database and DBMS

  • Database: A structured collection of data.
  • DBMS (Database Management System): Software for managing and accessing databases.
  • User Interaction: Users interact with the DBMS to query and manage the database.
    • Example: A user might ask, "What is the average annual income of a Singapore tax payer?"

Tables, Relations, Relational Model

  • Data Representation: Data is organized in tables (also referred to as relations).
  • Income_Table Example:
    • Columns: TaxpayerID, AnnualIncome
    • Rows: (51248297, 100000), (33891634, 50000), and so on.
  • Relational Model: The conceptual model for organizing data into tables.

Structured Query Language (SQL)

  • Purpose: SQL is used to interact with the database to retrieve, insert, update, and delete data.
  • SQL Query Example:
    • To find the average annual income from the Income_Table, the SQL query would be:
      SELECT avg(Annual_Income) FROM Income_Table
  • More details about SQL will be covered later in the course.

Querying RDBMS

  • RDBMS (Relational Database Management System): A DBMS based on the relational model.
  • Components:
    • Database: The collection of data.
    • DBMS: The software to manage the database.
    • SQL: The language used to query the database.
    • DBMS Interface: Provides an interface for users and applications to interact with the DBMS.
    • Relational Algebra: Theoretical foundation for relational operations.

Roadmap

  • Next lectures will cover:
    • Introduction to SQL
    • SELECT, FROM, WHERE clauses
    • Eliminating duplicates
    • Renaming attributes
    • Expressions in SELECT Clause
    • Patterns for Strings
    • Ordering
    • Joins

What is SQL?

  • Definition: Structured Query Language (SQL) is the standard query language for relational databases.
  • Pronunciation: "S-Q-L" or "sequel"
  • History:
    • 1974: First proposal of SEQUEL (IBM Research, System R).
    • 1981: First implementation in SQL/DS (IBM) and Oracle.
    • 1983: "De facto standard" emerged.
    • 1986: Became an official standard defined by ANSI and ISO.
  • Standards Evolution:
    • ANSI SQL89
    • ANSI SQL92 (SQL2)
    • ANSI SQL99 (SQL3)
    • ANSI SQL 2003 (added OLAP, XML, etc.)
    • ANSI SQL 2006 (added more XML)
    • ANSI SQL 2016 (added pattern matching, JSON, etc.)

Present Day: Big Data Analytics

  • Landscape: The slide displays a variety of database technologies and vendors.
    • Structured DB: Oracle, MySQL, PostgreSQL, etc.
    • Infrastructure Operational.
    • As A Service.
    • Technologies: HBASE, Cassandra, Hadoop, etc.
  • New technology maintains the same SQL Principles.

SQL Standards

  • Vendor Conformance: Major database vendors (Oracle, IBM, Microsoft, Sybase) generally conform to the SQL standard.
  • Proprietary Extensions: Database companies often add proprietary extensions, leading to different SQL dialects.
  • Incompatibilities:
    • Exist between systems.
    • Can arise with newer standards (e.g., triggers in SQL:1999).
  • Focus: The course concentrates on the principles of SQL, primarily using SQL92 as a basic subset.

Learning SQL: Good Practices

  • Hands-on Practice:
    • Install a DBMS (e.g., PostgreSQL, MySQL) on your machine.
    • Set up a database and tables with sample data.
    • Run SQL queries and debug.
  • Resources:
    • SQL server in the school lab.
    • Google errors (Stack Overflow).
    • Consult textbooks.
    • Online resources like w3schools.com.
    • Comparison of SQL implementations by Troels Arvin.

SQL Operations

  • Actions on Relations:
    • Retrieve data.
    • Insert data.
    • Delete data.
    • Update data.
  • Purpose: SQL is used to manage and query databases.

Declarative Language

  • SQL is Declarative: SQL is a declarative (non-procedural) language.
    • Specifies what to retrieve, not how to retrieve it.
  • Procedural Language Example:
    • C, C++, Java, where you write step-by-step instructions.
  • SQL Limitations:
    • SQL is not a complete programming language.
    • It lacks control or iteration commands.

SQL Capabilities

  • Queries: SQL is used to perform queries.
  • Updates: SQL is used to add, delete, and modify data.
  • Data Definition Language (DDL):
    • Creates databases, tables, and indices.
    • Creates views.
    • Specifies integrity constraints.
  • Embedded SQL:
    • Wrap a high-level programming language around DML to perform more advanced queries/updates.
  • Data Manipulation Language (DML):

Tables in SQL

  • Relation/Table: A multiset of tuples (rows) with attributes defined by the schema.
  • Schema: Contains the name of the relation and the set of attributes.
  • Example:
    • Table Name: Product
    • Attributes: PName, Price, Category, Manufacturer
  • Multiset:
    • An unordered list that allows duplicate instances.
    • List: [1, 1, 2, 3]
    • Set: {1, 2, 3}
    • Multiset: {1, 1, 2, 3}

Attributes (Columns) in a Table

  • Definition: An attribute (or column) is a typed data entry in each tuple of the relation.
  • Example from Product Table:
    • PName, Price, Category, Manufacturer are attributes.
    • Each tuple contains values for these attributes.

Tuples (Rows) in a Table

  • Definition: A tuple (or row) is a single entry in the table with the attributes specified by the schema.
  • Alternate Name: Also referred to as a record.
  • Example from Product Table:
    • (iPhone x, 888, Phone, Apple) is a tuple.

Data Types in SQL

  • Character Strings:
    • CHAR(20)
    • VARCHAR(50)
  • Numbers:
    • INT
    • FLOAT
  • Others:
    • BOOLEAN
    • DATETIME
  • Every attribute must have a type.

Key of a Table

  • Definition: A key is an attribute (or set of attributes) whose values are unique.
  • Primary Key: A key that uniquely identifies each tuple in the table.
    • Underlined in the schema definition, e.g., Product(PName, Price, Category, Manufacturer).

Principle Form of SQL

  • Basic Structure:
    SELECT desired_attributes (A1, A2, …, An)
    FROM one_or_more_tables (R1, R2, …, Rm)
    WHERE condition_about_tuples_of_the_tables (P)
  • Mapping to Relational Algebra:
    \Pi A1, A2, …, An (\sigma P (R1 \times R2 \times … \times Rm))

Simple SQL Query

  • Product Table Example:
    • Shows sample data in the Product table.
  • SQL Query:
    SELECT * FROM Product WHERE Category = ‘Phone’
  • Explanation:
    • Retrieve all attributes (*) from the Product table where the Category is 'Phone'.
    • This is a “selection” operation.

Simple SQL Query (cont.)

  • SQL Query:
    SELECT * FROM Product WHERE Category <> ‘Phone’
  • Explanation:
    • The <> operator means "not equal to."
    • This query selects all rows where the category is not 'Phone'.

Simple SQL Query (AND)

  • SQL Query:
    SELECT * FROM Product WHERE Category = ‘Phone’ AND Price > 800
  • Explanation:
    • Selects all attributes from the Product where the Category is 'Phone' AND the Price is greater than 800.

Simple SQL Query (OR)

  • SQL Query:
    SELECT * FROM Product WHERE Category = ‘Tablet’ OR Price > 1000
  • Explanation:
    • Selects all attributes from the Product table where the Category is 'Tablet' OR the Price is greater than 1000.

Simple SQL Query (Selection and Projection)

  • SQL Query:
    SELECT PName, Price, Manufacturer FROM Product WHERE Price > 800
  • Explanation:
    • Selects the PName, Price, and Manufacturer attributes from the Product table where the Price is greater than 800.
    • Combines selection and projection operations.

Simple SQL Query (WHERE Clause: BETWEEN)

  • SQL Query:
    SELECT PName, Price, Manufacturer FROM Product WHERE Price BETWEEN 800 AND 1200
  • Explanation:
    • Selects the PName, Price, and Manufacturer from the Product table where the Price is between 800 and 1200 (inclusive).
  • SQL Query:
    SELECT PName, Price, Manufacturer FROM Product WHERE Manufacturer IN (‘Huawei’, ‘Canon’)
  • Explanation:
    • Selects the PName, Price, and Manufacturer from the Product table where the Manufacturer is either 'Huawei' or 'Canon'.

SQL Syntax

  • Reserved Words:
    • A set of reserved words (e.g., SELECT, FROM, WHERE) cannot be used as names for tables or attributes.
  • Constants:
    • Use single quotes for constants (e.g., 'abc' is okay, "abc" is not).
  • Case-Insensitivity:
    • SQL is generally case-insensitive.
    • Exception: string constants are case-sensitive (e.g., 'FRED' is not the same as 'fred').
  • Whitespace:
    • Whitespace is ignored.
  • Semicolon:
    • All statements end with a semicolon (;).

Eliminating Duplicates

  • SQL Query:
    SELECT Category FROM Product
  • Output:
    • May contain duplicate categories.

Eliminating Duplicates (cont.)

  • SQL Query:
    SELECT DISTINCT Category FROM Product
  • Explanation:
    • DISTINCT keyword eliminates duplicate values in the result set.

AS: Renaming Attributes

  • SQL Query:
    SELECT PName AS Product, Price AS Cost, Manufacturer FROM Product WHERE Category = ‘Phone’
  • Explanation:
    • AS keyword renames attributes in the output.
    • PName is renamed to Product, and Price is renamed to Cost.

Expressions in SELECT Clause

  • SQL Query:
    SELECT PName, Price*1.4 AS Cost_IN_SGD, Manufacturer FROM Product WHERE Category = ‘Phone’
  • Explanation:
    • Calculates a new attribute Cost_IN_SGD by multiplying Price by 1.4.
  • SQL Queries:
  • SELECT * FROM Product WHERE PName LIKE ‘iPh%’
  • SELECT * FROM Product WHERE PName LIKE ‘%Phone x%’
  • SELECT * FROM Product WHERE PName LIKE ‘%P%e%’
  • SELECT * FROM Product WHERE PName LIKE ‘ _Phone x’
  • SELECT * FROM Product WHERE PName LIKE ‘ Phone_ ’
  • SELECT * FROM Product WHERE PName NOT LIKE ‘ Phone_ ’
  • Explanation:
    • % stands for "any string"
    • _ stands for "any character"

Patterns for Strings (cont.)

  • '_ _ _' – Matches any string of exactly three characters
  • '_ _ _%' – Matches any string of at least three characters
  • 'ab\%cd%' – Match all strings beginning with "ab%cd"

Ordering the Results (cont.)

  • SQL Query:
    SELECT PName, Price FROM Product WHERE Price < 800 ORDER BY PName DESC
  • Explanation:
    • Orders the result set by PName in descending order.

Ordering the Results (cont.)

  • SQL Query:
    SELECT PName, Category FROM Product WHERE Price < 1000 ORDER BY Category, PName
  • Explanation:
    • Orders the result set first by Category (ascending by default) and then by PName (ascending by default).

Ordering the Results (cont.)

  • SQL Query:
    SELECT PName, Category FROM Product WHERE Price < 1000 ORDER BY Category DESC, PName
  • Explanation:
    • Orders the result set first by Category in descending order and then by PName in ascending order.

Ordering the Results (cont.)

  • SQL Query:
    SELECT PName, Category FROM Product WHERE Price < 1000 ORDER BY Category DESC, PName DESC
  • Explanation:
    • Orders the result set first by Category in descending order and then by PName in descending order.

Exercise

  • What is the output of the following query?
    SELECT DISTINCT Category FROM Product ORDER BY Category

Exercise (cont.)

  • Corrected Query and Explanation:
    • The correct query should extract distinct categories and order them.
  • The final correct query:
    SELECT DISTINCT Category FROM Product ORDER BY Category

Exercise

  • Why the following query is incorrect?
    SELECT DISTINCT Category FROM Product ORDER BY Category WHERE Price < 1000

Exercise (cont.)

  • Explanation:
    • "WHERE" should always precede “ORDER BY”.

Exercise (cont.)

  • The following query is incorrect:
    SELECT DISTINCT Category FROM Product ORDER BY PName
  • Explanation:
    • “ORDER BY” items must appear in the select list if “SELECT DISTINCT” is specified.

Joins

  • Scenario:
    • Given two tables, Product and Company.
  • Problem:
    • A user wants to know the names and prices of all products by Japan companies.
  • SQL Query:
    SELECT PName, Price FROM Product, Company WHERE Country = ‘Japan’ AND Manufacturer = CName

Meaning (Semantics) of Join – An Example

  • SQL Query:
    SELECT R.A FROM R, S WHERE R.A = S.B
  • Steps:
    • Take cross product: X = R \times S
    • Apply selections / conditions:
    • Apply projections to get the final output.
  • Filtering! => Returning only some attributes

How Join is Actually Executed

  • The preceding slides show what a join means (i.e., semantics).
  • Not actually how the DBMS executes it under the covers.

Joins

  • Find the names of the persons who work for companies in USA
    • SELECT PName FROM Person, Company WHERE Country = 'USA' AND WorksFor = CName

Joins

  • Find the names of the persons who work for companies in USA, as well as their company addresses
    • SELECT PName, Company.Address FROM Person, Company WHERE Country = 'USA' AND WorksFor = CName

Joins

  • Find the names of the persons who work for companies in USA, as well as their company addresses
    • SELECT X.PName, Y.Address FROM Person AS X, Company AS Y WHERE Y.Country = 'USA' AND X.CName = Y.CName

Exercise

  • Find the names of the companies in China that produce products in the ‘tablet’ category
    • SELECT DISTINCT CName FROM Company, Product WHERE Manufacturer = CName AND Country = ‘China’ AND Category = ‘Tablet’

Exercise

  • Find the names of the companies in China that produce products in the ‘tablet’ or ‘phone’ category
    • SELECT DISTINCT CName FROM Company, Product WHERE Manufacturer = CName AND Country = ‘China’ AND (Category = ‘Tablet’ OR Category = ‘Phone’)

Exercise

  • Find the manufacturers that produce products in both the ‘tablet’ and ‘phone’ categories
  • SELECT DISTINCT X.Manufacturer FROM Product AS X, Product AS Y WHERE X.Manufacturer = Y.Manufacturer AND X.Category = ‘Tablet’ AND Y.Category = ‘Phone’$$