Intermediate SQL Concepts

Join Expressions

  • Join operators are typically used in the FROM clause as subquery expressions.

    • Inner Join: Includes natural join, join using (attribute) for equality conditions, and join on (predicate) for theta-joins.

    • Outer Join: Extends the inner join by including unmatched tuples.

    • (Cross) Join: Creates a Cartesian product.

  • Implicit joins (e.g., SELECT * FROM student, takes WHERE student.ID = takes.ID) are equivalent to theta-joins with an equality predicate.

  • The INNER keyword is optional (e.g., natural join ≡ inner natural join).

  • Theta-Join (<em>θ\bowtie<em>\theta): Cartesian product followed by a selection (e.g., σ</em>student.id=takes.id(student×takes)\sigma</em>{student.id = takes.id}(student \times takes)) resulting in  studentstudent.id=takes.idtakes\ student \bowtie_{student.id = takes.id} takes

Joins in Relational Algebra

  • Theta-Join:  r<em>θs=σ</em>θ(r×s)\ r \bowtie<em>{\theta} s = \sigma</em>{\theta}(r \times s)

    • Duplicate columns are not eliminated.

    • Equi-join: A theta-join where θ\,\theta is an equality of two attributes.

  • Natural Join: ˚s=Π<em>L[σ</em>C(r×s)]\r \bowtie s = \Pi<em>L [\sigma</em>C (r \times s)], where

    • C: r.A1= s.A1 \land r.A2= s.A2 \land . . . \land r.An= s.An, {A1, A2, . . . , An} is the set of common attributes of r and s.

    • L: Union of all attributes in r and s.

    • Eliminates one of the duplicate columns.

    • If r and s have no common attributes, r ⋈ s = r × s; there is no selection condition, every tuple is selected

    • Equijoin ensures equality condition on common attributes without duplicate columns.

Natural Join in SQL

  • Matches tuples with the same values for all common attributes.

  • Retains only one copy of each common column.

  • Achieves a theta-join effect but duplicate columns are removed.

  • Example:

    • SELECT * FROM student, takes WHERE student.ID = takes.ID (theta-join, duplicate columns not removed).

    • SELECT * FROM student NATURAL JOIN takes (natural join, duplicate columns removed).

Natural Join with Using Clause

  • JOIN ... USING (attributes) specifies equated columns.

  • It helps to avoid equating attributes erroneously.

  • Removes one of the duplicate columns.

  • Example: Find names of students along with titles of courses that they take.

    • SELECT name, title FROM (student NATURAL JOIN takes) JOIN course USING (course_id)

Join Conditions

  • JOIN ... ON predicates allows general predicates over the relations being joined.

  • Almost the same as a WHERE clause predicate.

  • Example: Find students who take some courses.

    • SELECT * FROM student JOIN takes ON student.ID = takes.ID is equivalent to SELECT * FROM student, takes WHERE student.ID = takes.ID

  • Theta-join duplicates columns are not removed

  • SELECT * FROM student JOIN takes; is equivalent to cartesian product.

Outer Join

  • Computes the join and keeps tuples that have no matching tuples to the result of the join.

  • Three forms: left, right, and full outer join.

    • NATURAL [LEFT/RIGHT/FULL] OUTER JOIN

    • [LEFT/RIGHT/FULL] OUTER JOIN USING (attributes)

    • [LEFT/RIGHT/FULL] OUTER JOIN ... ON predicates

  • Keeps dangling tuples.

Left Outer Join

  • NATURAL LEFT OUTER JOIN

  • In relational algebra:  r ⁣s=(rs)((rΠrattributes(rs))×(null,...,null)\ r \Join \mkern-10mu {\stackrel{\text{}}{\leftarrow}} \mkern-4mu s = (r⋈s) \cup ((r – \Pi_{r_attributes}(r⋈s)) \times {(null, . . . , null)}

    • (null,...,null){(null, . . . , null)}: singleton relation with attributes of s not in r.

    • Keeps dangled r-tuples.

Note: not exists, antijoin, left outer join .. is null

  • Find IDs and names of instructors who did not teach any course in 2018.

  • Uses NATURAL LEFT OUTER JOIN to identify instructor tuples that do not join with any 2018 teach tuple.

  • Example:

    • SELECT ID, name FROM instructor NATURAL LEFT OUTER JOIN (SELECT* FROM teaches WHERE teaches.year = 2018) t WHERE t.ID IS NULL;

  • Equivalent to finding the difference between all instructors and instructors who taught in 2018.

    • SELECT ID, name FROM instructor EXCEPT SELECT DISTINCT i.ID, i.name FROM instructor i, teaches t WHERE i.ID=t.ID AND t.year=2018;

Right Outer Join

  • NATURAL RIGHT OUTER JOIN

  • In relational algebra:  r ⁣s=(rs)((null,...,null)×(sΠattributesofs(rs))\ r \Join \mkern-10mu {\stackrel{\text{}}{\rightarrow}} \mkern-4mu s = (r⋈s) \cup ({(null, . . . , null)} \times (s – \Pi_{attributes \, of \, s}(r⋈s))

    • (null,...,null){(null, . . . , null)}: singleton relation with attributes of r not in s

Full Outer Join

  • NATURAL FULL OUTER JOIN

  • In relational algebra:  course ⁣prereq\ course \Join \mkern-10mu {\stackrel{\text{}}{\leftrightarrow}} \mkern-4mu prereq

  • Outer join with join condition θ\,\theta in Relational Algebra

    • left outer join θ\Join_{\theta}

    • right outer join θ\Join_{\theta}

    • full outer join θ\Join_{\theta}

outer join .. using, outer join .. on

  • LEFT/RIGHT/FULL OUTER JOIN .. USING (attribute)

  • LEFT/RIGHT/FULL OUTER JOIN .. ON (predicate)

  • Duplicate columns not removed.

Summary: Join Types and Join Conditions

  • Join Type:

    • How unmatched tuples are treated.

      • Inner Join: Does not preserve unmatched tuples (INNER is optional).

        • Natural Join, Theta Join

      • Outer Join: Preserves unmatched tuples.

        • Left, Right, Full Outer Join

  • Join Condition:

    • Which tuples in the two relations match.

      • ON (theta-join)

      • NATURAL

      • USING (A1, A2, . . ., An) (natural join with specified columns)

Views

  • A view is a virtual relation that is not physically stored.

  • It provides a mechanism to hide certain data from certain users.

  • Example: Creating a view instructor_without_salary to hide instructor salaries.

  • Avoids information repetition and data inconsistency.

Views (cont.)

  • The CREATE VIEW statement:

    • CREATE VIEW v AS <query expression>

    • The view can be used as if it were a stored relation.

  • Evaluation of a query involving views:

    • Query expression is modified by substituting a view definition for the view name (view expansion).

Note: View expansion

  • Example: Creating a view faculty (instructors without salary).

    • CREATE VIEW faculty AS SELECT ID, name, dept_name FROM instructor

  • Querying the view:

    • SELECT name FROM faculty WHERE dept_name = 'Biology' (undergoes view expansion).

  • Views are useful when:

    • There are many queries using common predicates.

    • Computing complex queries.

Views (cont.)

  • Views defined by other views:

    • Views can be layered, where one view is defined based on another.

  • Example:

    • CREATE VIEW V1 SELECT . . . FROM . . . WHERE . . .

    • CREATE VIEW V2 SELECT . . . FROM V1, . . . WHERE . . .

    • CREATE VIEW Vn SELECT . . . FROM Vi, Vj, . . . WHERE . . .

Views (cont.)

  • Example: View physics_fall_2017 (Physics courses offered in 2017 fall semester).

    • CREATE VIEW physics_fall_2017 AS SELECT c.course_id, sec_id, building, room_number FROM course c, section s WHERE c.course_id = s.course_id AND c.dept_name = 'Physics' AND s.semester = 'Fall' AND s.year = 2017;

  • Example: View physics_fall_2017_watson (Physics courses offered in 2017 fall semester in Watson building).

    • CREATE VIEW physics_fall_2017_watson AS SELECT course_id, room_number FROM physics_fall_2017 WHERE building= 'Watson';

Note: Programmer Productivity vs Performance

  • Illustrates how view expansion can lead to more complex queries.

  • Complex queries are created from simple views.

  • View expansion replaces view names with their underlying query definitions.

Views: Updatability

  • Modifications on a view lead to modification of stored relation(s) (insert, delete, update).

  • Some databases do not allow modification.

  • Problem: Insertion to a view may not have values for all attributes of the underlying relation.

    • Two approaches: reject insertion or use null values for omitted attributes (not allowed if there are not null constraints!).

Views: Updatability (cont.)

  • Problem: A view defined by multiple relations may lead to issues during updates.

    • Example: Inserting into instructor_location view may require inserting into both instructor and department tables, potentially causing integrity constraint violations.

Views: Updatability (cont.)

  • Updatable views in SQL:

    • Most SQL implementations allow updates only on simple views.

    • Theoretical rules are complex, but generally:

      • FROM clause has only one relation.

      • SELECT clause contains only attribute names (no expressions, aggregates, or distinct).

      • No GROUP BY or HAVING clause.

      • Every NOT NULL attribute in the underlying relation must be part of the view.

  • Simple attributes from one relation enforce rules such as no joins, aggregation, distinct, or expressions.

Views: Materialized Views

  • A materialized view is physically stored.

  • Updates to underlying stored relations require updates to the materialized view.

  • Example: CREATE MATERIALIZED VIEW student_course AS SELECT ID, name, course_id FROM student NATURAL JOIN takes

    • Also called a base relation.

Views: Materialized Views

  • Materialized view maintenance:

    • Periodic reconstruction: May not be acceptable for applications that require up-to-date data.

    • Incremental maintenance: Recompute only parts of the materialized view affected by changes.

  • Many materialized views can be maintained incrementally, but not all.

  • Maintaining materialized views involves addressing view updatability problems.

Transactions

  • A transaction is a unit of work consisting of a sequence of query and/or update statements.

  • Implies an ACID transaction (Atomicity, Consistency, Isolation, Durability).

  • A transaction demarcation:

    • Begins implicitly when an SQL statement is executed.

    • Ends with COMMIT WORK (updates become permanent) or ROLLBACK WORK (updates are undone).

  • Autocommit: Each SQL statement is executed in its own transaction (implicitly committed).

  • Explicit transaction boundaries: BEGIN . . . COMMIT/ROLLBACK.

Note: Transaction in Online SQL Interpreter

  • Autocommit mode is default in SQLite.

  • There is no active transaction when "rollback" is issued in autocommit mode.

Note: ACID properties of transaction

  • Atomicity: All-or-nothing execution.

  • Consistency: Preserve the consistency (integrity constraints) of the database.

  • Isolation: Execute as if no other transaction is executing at the same time.

  • Durability: Effect on the database must never be lost once the transaction has completed.

Note: Isolation

  • Example: Airline reservation system where two customers concurrently try to reserve the same seat.

  • Illustrates a "lost update" problem when operations of User1 are not isolated from User2 operations.

Note: Atomicity

  • Example: Fund transfer between bank accounts.

  • If a crash occurs mid-transaction, database operations need to be executed atomically.

  • Simple fix: Execute the entire transaction again or change the execution sequence.

Integrity Constraints (IC)

  • Guard against accidental damage to the database and enforce organizational rules.

  • Ensure that changes to the DB do not result in a loss of data consistency.

  • Major topics:

    • Constraints on a single relation.

    • Referential integrity (cascading actions).

    • Deferred checking of constraints.

    • Subqueries in check conditions.

    • Assertions (rules for correctness of DB).

IC: Constraints on a Single Relation

  • Not Null

  • Primary Key

  • Unique

  • Check (P), where P is a predicate

IC: Constraints on a Single Relation (cont.)

  • NOT NULL constraint

    • Example: Declare building and budget to be not null in the department table.

    • CREATE TABLE department (dept_name VARCHAR(20), building VARCHAR(15) NOT NULL, budget NUMERIC(12,2) NOT NULL, PRIMARY KEY (dept_name));

  • UNIQUE (A1, A2, …, Am) constraint

    • States that the attributes \,A1, A2, …, Am form a superkey.

    • Superkey: Permitted to be null (in contrast to primary key).

    • Example:

      • CREATE TABLE employee (ID CHAR(5) PRIMARY KEY, name VARCHAR(20) NOT NULL UNIQUE, dept_name VARCHAR(20), salary NUMERIC(8,2));

IC: Constraints on a Single Relation (cont.)

  • CHECK(P) clause

    • Specifies a predicate P that must be satisfied by every tuple in a relation.

    • Example: Ensure that semester is one of {fall, winter, spring, summer}.

    • CREATE TABLE section (course_id VARCHAR (8), sec_id VARCHAR (8) DEFAULT '1', semester VARCHAR (6), year NUMERIC (4,0) CHECK (year BETWEEN 1900 AND 2100), building VARCHAR (15), room_number VARCHAR (7) CHECK (LENGTH(room_number) <= 5), time_slot_id VARCHAR (4), PRIMARY KEY (course_id, sec_id, semester, year), CHECK (semester IN ('Fall', 'Winter', 'Spring', 'Summer')));

  • The predicate in the check clause can be an arbitrary predicate, except for a subquery.

IC: Referential Integrity

  • Ensures that every value in R.A also appears in S.B.

  • Foreign Key Constraint:

    • Ensures every value in R.A also appears in the primary key of S.

    • An attribute(s) A is called a foreign key in relation R.

    • Example: CREATE TABLE instructor (. . . FOREIGN KEY (dept_name) REFERENCES department;

  • R.A can be NULL if not declared to be NOT NULL.

IC: Assigning Names to Constraints

  • Constraints can have names (useful for dropping constraints later).

    • CONSTRAINT <name> <constraint>

    • Example: Give a name to the primary key constraint in the department table.

      • dept_name VARCHAR(20) CONSTRAINT keyName PRIMARY KEY

    • Example: Give a name to the check constraint on attribute salary in the instructor table.

      • salary NUMERIC(8,2) CONSTRAINT minsalary CHECK (salary > 29000)

IC: Altering Constraints on Tables

  • ALTER TABLE <name> ADD CONSTRAINT <name> constraint

  • ALTER TABLE <name> DROP CONSTRAINT <name>

  • Example: ALTER TABLE department ADD CONSTRAINT uniqueBuilding UNIQUE (building);

  • To drop a constraint later on, a name is needed on the constraint.

Cases of Foreign Key Violation

  • Foreign key violation can occur during insert/update in the referencing table or delete/update in the referenced table.

    • Reject by default, but other options can be specified in the referencing relation to restore the constraint.

IC: Actions when Referential IC Violation

  • Default action: reject the action that caused the violation.

  • Other options on the referencing relation:

    • ON {DELETE | UPDATE} CASCADE (cascade deletion/update).

    • ON {DELETE | UPDATE} SET NULL

    • ON {DELETE | UPDATE} SET DEFAULT

IC Violation During Transactions

  • Cases that look difficult to avoid IC Violation

    • Every tuple refers to some other tuples

      • How to insert a tuple without causing constraint violation?

      • Not possible if spouse has not null constraint!

  • Common characteristics in previous two examples

    • IC violation can be temporary, and can be fixed within a transaction if we use an appropriate transaction boundary

  • We can avoid the problem if we can defer the constraint checking time

IC Violation During Transactions (cont.)

  • Common characteristics in previous two examples:

    • IC violation can be temporary, and can be fixed within a transaction if we use an appropriate transaction boundary

    • if we can defer the constraint checking time We can avoid the problem.

  • Common characteristics in previous two examples

    • tell the DBMS not to check the constraints until after the whole transaction has finished its action and is about to commit deferred checking of constraints

    • solution to avoid violation of IC constraints:
      *group the two insertions into a single transaction

Note: Deferred Checking of Constraints described in constraint specification

  • Constraint checking activation time:

    • Immediately when an event occurs (by default).

    • At the time of transaction commit (deferred).

Note: Deferred Checking of Constraints (cont.)

  • Allow the checking of the foreign-key constraint to be deferred until the end of the transaction.

  • CREATE TABLE course (course_id VARCHAR(8), title VARCHAR(50), dept_name VARCHAR(20), credits NUMERIC(2,0), PRIMARY KEY (course_id), CONSTRAINT foreignKey_dept_name FOREIGN KEY (dept_name) REFERENCES department DEFERRABLE INITIALLY DEFERRED );

Note: Deferred Checking of Constraints (cont.)

  • Change the checking time of a deferrable constraint: IMMEDIATE ↔ DEFERRED

    • Immediate → Deferred:

      • SET CONSTRAINT constraint-name DEFERRED

    • Deferred → Immediate:

      • SET CONSTRAINT constraint-name IMMEDIATE

Subquery in check (..), Assertion

  • Example:

    • CREATE TABLE section (course_id VARCHAR (8), sec_id VARCHAR (8), . . . time_slot_id VARCHAR (4), CHECK (time_slot_id IN (SELECT time_slot_id FROM time_slot)));

  • A subquery in the check clause

  • Assertion:

    • Express a condition that the DB has to always satisfy.

    • CREATE ASSERTION credits_earned_constraint CHECK (NOT EXISTS (SELECT ID FROM student WHERE tot_cred <> (SELECT COALESCE(SUM(credits), 0) FROM takes NATURAL JOIN course WHERE student.ID = takes.ID AND grade IS NOT NULL AND grade <> 'F')))

SQL Data Types

  • Date and Time Data Types:

    • Date: year, month, and date (e.g., DATE '2005-7-27').

    • Time: hours, minutes, and seconds (e.g., TIME '09:00:30').

    • Timestamp: date + time (e.g., TIMESTAMP '2005-7-27 09:00:30').

Data Types (cont.)

  • Example table student_date_time:

    • Includes date and time columns.

  • Date/time comparisons:

    • Date separators "-" can be omitted in some systems (e.g., SQL Server).

Data Types: Type Conversion

  • CAST() function: CAST (exp AS type)

    • Convert an expression exp to type type.

    • Example: Converting ID (VARCHAR) to NUMERIC for sorting in the instructor table.

      • SELECT CAST(ID AS NUMERIC(5)) AS inst_id FROM instructor ORDER BY inst_id

Note: 1st non-null value

  • COALESCE() function: COALESCE(arg1, . . . , argn)

    • Returns the 1st non-null argument.

    • All arguments must be of the same type.

    • Example: Retrieve instructor IDs and salaries, showing null salaries as 0.

      • SELECT ID, COALESCE(salary, 0) AS salary FROM instructor

Note: Match & Return a value

  • DECODE() function (Oracle):

    • DECODE(expression, match1, return1, match2, return2, . . ., matchn, returnn, default)

    • Evaluates the expression and returns the corresponding return value for the first matching match value, or the default if no match is found.

    • Example: Show null salaries as 'N/A'.

      • SELECT ID, DECODE (salary, NULL, 'N/A', salary) AS salary FROM instructor

  • SQL standard to use similar functionality: CASE WHEN

Data Types: Large-Object Types

  • Large Objects: photos, videos, CAD files, etc.

  • Large object data types:

    • BLOB (Binary Large Object): large collection of uninterpreted binary data.

    • CLOB (Character Large Object): large collection of character data.

  • When a query returns a large object, a locator (pointer) is returned.

Data Types: User-Defined Type (UDT)

  • CREATE TYPE statement: CREATE TYPE <UDT> AS <type-definition>

    • Creates User-Defined Type (UDT).

  • Uses of UDT:

    • Assigning a value of type Dollars to a variable of type Pounds: error!

    • budget + 20: error!

    • cast(budget as numeric(12,2)) + 20: OK

Create a Table using a Query Result

  • CREATE TABLE .. AS

    • Create a table that has a query result and store the result of a query as a new table.

    • CREATE TABLE biologyStudent AS SELECT * FROM student WHERE dept_name = 'Biology'

Index

  • An index on an attribute of a relation is a data structure for efficiently finding tuples in the relation that have a specified value for that attribute.

  • CREATE INDEX command: CREATE INDEX <name> ON <relation-name> (attribute)

Index (cont.)

  • Example:

    • Index on dept_name in the relation student.

    • Index on ID in the relation student.

  • Tree-type index (e.g., B+-tree) and hashing can be used.

  • Hash function example: f(x) = x mod 8.

Index (cont.)

  • Example:

    • CREATE TABLE student (ID VARCHAR (5), name VARCHAR (20) NOT NULL, dept_name VARCHAR (20), tot_cred NUMERIC (3,0) DEFAULT 0, PRIMARY KEY (ID));

    • CREATE INDEX studentID_index ON student(ID);

Index (cont.)

  • Advantages of indexes:

    • Can speed up query processing.

  • Disadvantages of indexes:

    • May make modifications (insertion, deletion, update) more complex and time-consuming because of index management.

Authorization

  • Authorization is the permission to use resources.

  • Can be granted on parts of the database (read, insert, delete, update data).

  • Privilege is the right to perform a particular action.

Granting Privileges

  • GRANT <privilege-list> ON <relation OR view> TO <user-list>

  • USER-LIST:

    • A user-id

    • Public: all valid users

Granting Privileges (cont.)

  • Privileges in <privilege-list>:

    • SELECT: Allows read access to relation or the ability to query using the view.

    • INSERT, DELETE, UPDATE: Ability to insert/delete/update tuples.

    • ALL [PRIVILEGES]: Short form for all allowable privileges.

Revoking Privileges

  • REVOKE <privilege-list> ON <relation OR view> FROM <user-list>

  • <privilege-list>:

    • ALL [PRIVILEGES]: All privileges the <user-list> may hold.

Roles

  • Role: A particular position, or a part performed in a particular process.

    • Each role has an associated group of users.

    • Each role has an associated set of privileges.

  • CREATE ROLE <name>

Roles (cont.)

  • Grant privileges to roles:

    • GRANT insert, update ON teaches to instructor_role;

    • GRANT select ON takes to teaching_assistant;

  • Grant a role to users:

    • GRANT instructor_role to Amit, James;

  • Grant a role to other roles:

    • GRANT teaching_assistant to instructor_role;

Authorization on Views

  • Example:

    • CREATE VIEW geo_instructor AS (SELECT * FROM instructor WHERE dept_name = 'Geology');

    • GRANT select ON geo_instructor TO geo_staff

  • View creators have only privileges on that view that they already had on the underlying tables.

Authorization on Schema

  • The reference privilege:

    • Privilege to declare a foreign key when creating a relation.

  • Why the reference privilege?

    • A foreign-key constraint can restrict deletion and update operations on the referenced relation.

Transfer of privileges

  • WITH GRANT OPTION clause:

    • GRANT select ON department TO Amit WITH GRANT OPTION;

    • Allows Amit to grant this privilege to others.

  • Authorization graph:

    • The passing of a specific authorization from one user to another can be represented by a graph.

Revoking Privileges

  • Revoking privileges:

    • There can be other users' privileges that depend on the revoked privilege.

      • Cascading revocation: Causes other users to lose that privilege.

        • REVOKE select ON department FROM Amit CASCADE;

      • Prevents cascading revocation:

        • Returns an error if there are cascading revocations.

        • REVOKE select ON department FROM Amit RESTRICT;

Revoking Privileges: Authorization graph

  • Privilege granted several times

  • A user may be granted the same privilege several times by different grantors The user retains the privilege until the revocation from its grantor i.e., the privilege is retained until revocations from all the grantors

Note: DB-Engines Ranking

  • DB-Engines lists 424 different database management systems, which are classified according to their database model (e.g. relational DBMS, key-value stores etc.).

  • This pie-chart shows the number of systems in each category. Some of the systems belong to more than one category.

Note: DBMS popularity (cont.)

  • This chart shows the popularity of each category. It is calculated with the popularity (i.e. the ranking scores) of all individual systems per category.

  • The sum of all ranking scores is 100%.