MD

Query Processing Notes

Query Processing

Database Systems (H) - Dr Chris Anagnostopoulos

Roadmap

  • External Sorting: Fundamental Operator
  • Strategies for SELECT
    • Simple, conjunctive, disjunctive
  • Strategies for JOIN
    • Five fundamental algorithms for JOIN
  • Principle: Estimate the cost of each plan, choose the plan with the minimum expected cost, and then execute!

Fundamental Tool: Sorting

  • Almost all SQL queries involve sorting of tuples with respect to ad-hoc sorting requests defined by the user.
    • CREATE PRIMARY INDEX ON EMPLOYEE(SSN) means sort by SSN.
    • ORDER BY Name means sort by Name.
    • SELECT DISTINCT Salary means sort by Salary to create clusters and then identify the distinct values.
    • SELECT DNO, COUNT (*) FROM EMPLOYEE GROUP BY DNO means sort by DNO to create clusters.
  • Fundamental Limitation: We cannot store the entire relation into memory for sorting the records (e.g., bubble sort, quick sort, heap sort, merge sort).
  • External Sorting: Sorting algorithm for large relations stored on disk that do not fit entirely in main memory.

External Sorting: Overview

  • Principle: Divide & Sort (Conquer)
    • Divide: A file of b blocks into L smaller sub-files (b/L blocks each).
    • Sort: Load each small sub-file to memory, sort using, e.g., quick-sort, bubble-sort and write it back to the disk.
    • Merge: Merge two (or more) sorted sub-files loaded from disk in memory creating bigger sorted sub-files, that are merged in turn.

External Sorting: Overview (Lemma 1)

  • Lemma 1: The expected cost of the sort-merge strategy in block accesses is: 2b.(1 + log_M(L)). Where:
    • b is the number of file blocks.
    • M: degree of merging, i.e., number of sorted blocks merged in each loop.
    • L: number of the initial sorted sub-files (before entering merging phase).
  • Proof: (See Knuth, Donald (1998). "Chapter 5.4.1. Multiway Merging and Replacement Selection". Sorting and Searching. The Art of Computer Programming. 3 (2nd ed.). Addison-Wesley. pp. 158–168.)
  • M = 2 gives the worst-case performance because it merges in parallel only a pair of blocks at each step.
  • M > 2: merge more than two blocks at each step; (M-way merging).

Strategies for SELECT

  • Given: SELECT * FROM relation WHERE selection-conditions
  • S1: Linear search over a key
    • Retrieve every record; test whether it satisfies the selection condition.
    • SELECT * FROM EMPLOYEE WHERE SSN = 1234
    • Expected Cost: b/2
  • S2: Binary Search over a key
    • SELECT * FROM EMPLOYEE WHERE SSN = 1234
    • Expected Cost (unsorted file): log2(b) + 2b(1 + logM(L))
    • Expected Cost (sorted file): log_2(b)
  • S3: Primary Index or Hash Function over a key
    • SELECT * FROM EMPLOYEE WHERE SSN = 1234
    • Precondition (Index): Primary Index of level t over key (sorted by key)
    • Precondition (Hash): File hashed with the key
    • Expected Cost (sorted file): t + 1
    • Expected Cost (hashed file): 1 + n/2 (n = #overflow blocks < b)
  • S4: Hash Function over a non-key
    • SELECT * FROM EMPLOYEE WHERE DNO = 10
    • Precondition (Hash): File hashed with the non-key
    • Expected Cost: 1 + n (n = #overflow blocks having the non-key)
  • S5: Primary Index over a key in a range query: $>>, \geq, <, \leq
    • Use Index to find the record satisfying the equality (e.g., SSN = 9) and retrieve all subsequent blocks from the ordered file.
    • SELECT * FROM EMPLOYEE WHERE SSN > 8;
    • Precondition: Primary Index of level t over the sorted by key
    • Expected Cost (sorted file): t + O(b)
    • Estimated Cost (sorted file): t + P(X > x) . b = t + b . (1– x/r)
      • P(X = x): probability of SSN = x = 1/r
      • r = number of employees = b . f
      • f = employees/block
      • P(X > x) = 1 – P(X \leq x) = 1 - (1/r + … + 1/r) = 1 – x/r
  • S6: Clustering Index over ordering, non-key
    • Retrieve all contiguous blocks of the cluster.
    • SELECT * FROM EMPLOYEE WHERE DNO = 5;
    • Precondition: Clustering Index of level t on non-key (file sorted by non-key)
    • Expected cost (sorted file): t + O(b/n)
    • Estimation: t + b/n or t + P(X = x) . b
      • Note 1: n = #distinct values of the non-key attribute
      • Note 2: Attribute is uniformly distributed, i.e., P(X = x) = 1/n
  • S7: Secondary Index (B+ Tree) over non-ordering key
    • SELECT * FROM DEPARTMENT WHERE MGR_SSN = 1234
    • Precondition: File is not ordered by key.
    • Expected Cost: t + 1
    • B+ Leaf Node points at the unique block
  • S8: Secondary Index (B+ Tree) over non-ordering, non-key
    • Note: retrieve multiple records from different blocks having the same value.
    • SELECT * FROM EMPLOYEE WHERE SALARY = 40000;
    • Precondition: File is not ordered by non-key
    • Expected Cost: t + m + O(b)
      • Note 1: B+ Leaf Node points to the first block of a cluster of blocks storing pointers to data blocks with Salary = 40K (2 levels of indirection)
      • Note 2: m is number of blocks with block pointers (m \geq 1)

Strategies for Disjunctive SELECT

  • Disjunctive Selections: conditions involving OR
  • SELECT * FROM EMPLOYEE WHERE SALARY > 10000 OR NAME LIKE ‘%Chris%’
  • Final result: contains tuples satisfying the union of all selection conditions
  • Methodology:
    • IF an access path exists, e.g., B+/hash/primary-index for all of the attributes:
      • use each to retrieve the set of records satisfying each condition
      • union all sets to get the final result.
    • ELSE if none or some of the attributes have an access path, linear search is unavoidable!

Strategies for Conjunctive SELECT

  • Conjunctive Selections: conditions involving AND
  • SELECT * FROM EMPLOYEE WHERE SALARY > 40000 AND NAME LIKE ‘%Chris%’
  • Methodology:
    • IF an access path exists (index) for an attribute, use it to retrieve the tuples satisfying the condition, e.g., Salary > 40000 [intermediate result]
    • GO through this intermediate result to check which record satisfies also the other condition(s), e.g., Name LIKE ‘%Chris%’ in memory!
    • If you have two indexes, which index is to be used first?
      • Answer: use the index that generates the smallest intermediate result set hoping to fit in the memory! [selectivity = #tuples retrieved]
    • Optimization: find the execution sequence of conditions that minimizes the expected cost.
    • Principle: Predict the selectivity beforehand!

Strategies for JOIN

  • Observation: the most resource-consuming operator!
  • Focus: two-way equijoin, i.e., join two relations with equality ‘=’
  • SELECT * FROM EMPLOYEE E, DEPARTMENT D WHERE E.DNO = D.DNUMBER
  • Five fundamental strategies for join processing:
    • Naïve join (no access path)
    • Nested-loop join (no access path)
    • Index-based nested-loop join (index; B+ Trees)
    • Merge-join (sorted relations)
    • Hash-join (hashed relations)
  • SELECT * FROM R, S WHERE R.A = S.B --A and B are join attributes, e.g., PK, FK.
  • Step 1: Compute the Cartesian product of R and S, i.e., all tuples from R are concatenated (combined) with all tuples from S.
  • Step 2: Store the result in a file T and for each concatenated tuple t = (r, s) with r \in R and s \in S check if r.A = s.B
  • Algorithm Naïve Join
    • T = Cartesian R x S
    • Scan T, a tuple t \in T at a time: t = (r, s)
    • If r.A = s.B then add (r, s) to the result file
    • Go to next tuple t \in T.
  • Outcome: inefficient, typically the result is a small subset of the Cartesian!
  • What-If: no tuples are actually matched; predict the matching tuples in advance!

Nested-Loop Join

  • Algorithm Nested-Loop Join
    • For each tuple r \in R //outer relation
    • For each tuple s \in S //inner relation
    • If r.A = s.B then add (r, s) to the result file;
  • Note: the outer & inner loops are over blocks and not over tuples!
  • Note: Re-form the pseudocode in a block-centric programming mode (system programming using files)
  • Challenge 1: Which relation should be in the outer loop and which in the inner loop to minimize the join processing cost?

Nested-Loop Join: Algorithm

  • Step 1:
    • LOAD a set (chunk) of blocks from the outer relation R.
    • LOAD one block from inner relation S
    • Maintain an output buffer for the matching (resulting) tuples (r, s): r.A = s.B
  • Step 2:
    • JOIN the S block with each R block from the chunk
    • FOR each matching tuple r \in R-block and s \in S-block ADD (r, s) to Output Buffer
    • IF Outer Buffer is full, PAUSE; WRITE the current join result to disk; CONTINUE
  • Step 3: LOAD next S-block and GOTO Step 2
  • Step 4: GOTO Step 1

Nested-Loop Join: Cost

  • SELECT * FROM EMPLOYEE E, DEPARTMENT D WHERE E.DNO = D.DNUMBER
  • Employee (E): nE blocks used at the outer loop
  • Department (D): nD blocks used at the inner loop
  • Memory: nB blocks available:
    • 1 block for reading the inner file D,
    • 1 block for writing the join result,
    • nB-2 blocks for reading the outer file E: chunk size.
  • Observation 1: Each block of the outer relation E is read once.
  • Observation 2: The whole inner relation D is read every time we read a chunk of (nB-2) blocks of E.

Nested-Loop Join: Cost (Continued)

  • Total number of blocks read for outer relation E: nE
  • Outer Loops: Number of chunks of (nB-2) blocks of outer relation: ceil(nE/(nB-2))
  • For each chunk of (nB-2) blocks read all the blocks of inner relation D:
  • Total number of block read in all outer loops: nD*ceil(nE/(nB-2))
  • Total Expected Cost: nE + nD * ceil(nE/(nB-2)) block accesses
  • Example: nE = 2,000 blocks; nD = 10 blocks; nB = 7 blocks
    • Strategy Cost 1: (E outer; D inner) nE + nD * ceil(nE/(nB-2)) = 6,000 block accesses
    • Strategy Cost 2: (D outer; E inner) nD + nE * ceil(nD/(nB-2)) = 4,010 block accesses
  • Note: The file with fewer blocks goes to the outer loop.

Index-Based Nested-Loop Join

  • Idea: Use of an index on either A or B joining attributes: R.A = S.B.
  • Focus: Assume an index I on attribute B of relation S.
  • Algorithm Index-Based Nested-Loop Join
    • For each tuple r \in R
    • Use index of B: I(r.A), to retrieve all tuples s \in S having s.B = r.A
    • For each such tuple s \in S, add matching tuple (r, s) to the result file;
  • Claim: Much faster compared to the nested-loop join, why?
    • Because: We get immediate access on s \in S with s.B = r.A by searching for r.A using the index I, avoiding linear search on S.
  • Challenge 2: Which index to use to minimize the join processing cost?

Index-Based Nested-Loop: Cost

  • SELECT * FROM EMPLOYEE E, DEPARTMENT D WHERE D.MGR_SSN = E.SSN
  • B+ Tree on Mgr_Ssn with level xD = 2
  • B+ Tree on SSN with level xE = 4
  • E: rE = 6000 tuples; nE = 2,000 blocks; D: rD = 50 tuples; nD = 10 blocks
  • Strategy 1: Employee e, use B+ Tree (MgrSsn) to find department d: e.Ssn = d.MgrSsn.
    • Observation: not all employees are managers; –search without meaning sometimes…
    • Probability an employee being manager: 50/6000 = 0.83% (99.16% meaningless searches)
    • Strategy Cost 1: nE + rE*(xD + 1) = 20,000 block accesses;
  • Strategy 2: Department d, use B+ Tree (SSN) to find Employee e: e.Ssn = d.Mgr_Ssn
    • Observation: every department has one manager –search is fruitful…
    • Probability a manager being an employee is 100% ☺
    • Strategy Cost 2: nD + rD*(xE + 1) = 260 block accesses;
  • Huge difference (20,000 vs 260 block accesses):
    • every record in Department is joined with exactly one record in Employee (manager)
    • only some employees from Employee are managers of departments…
  • Lesson Learnt: Use the index built on the PK
  • Note: not for recursive relationships, e.g., employee-supervisor

Sort-Merge Join

  • Idea: Use of the merge-sort algorithm over two sorted relations w.r.t. their joining attributes.
  • Pre-condition: Relations R and S are physically ordered on their joining A and B;
  • Methodology:
    • Step 1: Load a pair {R.block, S.block} of sorted blocks into the memory;
    • Step 2: Both blocks are linearly scanned concurrently over the joining attributes (sort-merge algorithm in memory);
    • Step 3: If matching tuples found then store them in a buffer.
  • Gain: The blocks of each file are scanned only once!
  • But: If R and S are not a-priori physically ordered on A and B then sort them first!

Sort-Merge Join: Example

R

  • A sname rating age
  • 22 dustin 7 45.0
  • 28 yuppy 9 35.0
  • 44 guppy 5 35.0
  • 58 rusty 10 35.0

S

  • B bid day rname
  • 28 103 12/4/96 guppy
  • 28 103 11/3/96 yuppy
  • 31 101 10/10/96 dustin
  • 31 102 10/12/96 lubber
  • 31 101 10/11/96 lubber
  • 58 103 11/12/96 dustin

Result Buffer

  • (28,yuppy,9,35,103,12/4/96,guppy)
  • (28,yuppy,9,35,103,11/3/96,yuppy)
  • (58,rusty,10,35,103,11/12/96,dustin)

Sort-Merge-Join: Cost

  • Requirement: Efficient if both Employee E and Department D are already sorted by their joining attributes: SSN and Mgr_Ssn.
  • Observation: only a single pass is made for each file.
  • Strategy Cost: nE + nD = 2,010 block accesses.
  • IF both files are not sorted THEN use external sorting!
    • Strategy Cost 1: External sorting (2-way merge): 2.nE + 2. nE .log_2(ceil(nE / nB))
      • ceil(nE / nB): number of initial sorted sub-files (each sub-file is nB blocks)
      • nB: number of available memory blocks.
    • Strategy Cost 2: External sorting (2-way merge): 2.nD + 2. nD .log_2(ceil(nD / nB))
  • Total Strategy Cost:
    • nE + nD + 2.nE + 2. nE .log2(ceil(nE / nB)) + 2.nD + 2. nD .log2(ceil(nD / nB))
  • Example: nE = 2,000 blocks; nD = 10 blocks; nB = 7 blocks, we get:
    • 2010 + 4000 + 4000 log(286) + 20 + 20 log(2) = 38,690 block accesses; only 5.1% is devoted to join!
  • Think before sort only for joining purposes!

Hash-Join

  • Pre-condition:
    • File R is partitioned into M buckets w.r.t. hash function h over joining attribute A.
    • File S is also partitioned into M buckets w.r.t. the same hash function h over attribute B;
  • Assumption: R is the smallest file and fits into main memory: M buckets of R are in memory.
  • Algorithm Hash-Join
    • Partitioning phase
      • For each tuple r \in R,
      • Compute y = h(r.A) /* address of bucket*/
      • Place tuple r into bucket y = h(r.A) in memory
    • Probing phase
      • For each tuple s \in S,
      • Compute y = h(s.B) /use the same hash function h/
      • Find the bucket y = h(s.B) in memory (of the R partition).
      • For each tuple r \in R in the bucket y = h(s.B)
      • If s.B = r.A add (r, s) to the result file; /join/

Hash-Join (Partitioning and Probing phases)

Partitioning Phase

  • Partition of R over attribute A using hash h(A) = A \mod M into M buckets.
  • M+1 main memory buffers

Probing Phase

  • Hashing each tuple s from S, using hash h(s.B) = s.B \mod M to identify the y = h(s.B) bucket in memory.

Hash-Join: Cost

  • Best Case: Memory nB > nD + 2
    • nD: blocks for the smallest of the two relations (e.g., Department)
    • Whole relation Department fits in memory and is hashed into M buckets.
    • Each Employee tuple is loaded and hashed on joining attribute SSN.
    • The corresponding bucket is found and searched for a matching tuple.
    • The result is stored in another buffer (that’s why nB > nD + 2)
  • Best Case: nE + nD block accesses.
  • Normal Case (smallest relation cannot fit in memory): 3(nE + nD) block accesses

Put-All-Together: Join Cost Prediction

  • Naïve Join Cost: nE * nD: 20,000 block accesses
  • Nested-Loop Cost (best): nD + nE * ceil(nD/(nB-2)): 4,010 block accesses
  • Index-based Nested-Loop Cost (best): nD + rD*(xE + 1): 260 block accesses
  • Sort-Merge Cost (already sorted): nE + nD: 2,010 block accesses
  • Hash-Join Normal-Case Cost: 3(nE + nD): 6,030 block accesses
  • Hold on a second: the cost for writing the result-set buffer (block) from memory to disk in each strategy is not yet considered!
  • How many blocks are written? How many tuples are matched?…next weeks

So Far…

  • Naïve Join: Exploit nothing. Cartesian product and then check…
  • Nested-Loop: Exploit nothing. Computing-oriented join.
    • Which relation should be in the outer loop? Influences the join cost.
    • Can you predict the cost then? Optimization…
  • Index-based Nested-Loop: Exploit at least one index. Use index to find the matched tuples as quick as possible ☺
    • If we have two indexes (over R.A and over S.B), which one to use? Influences the join cost! Optimization…
  • Merge-Join: Exploit both ordered relations; otherwise; sort them 
  • Hash-Join: Exploit hashing. Hash one relation.
    • Use the same hashing function to find the matching tuples in the same bucket ☺

Use Case I

  • SELECT D.NAME, E.NAME FROM EMPLOYEE E, DEPARTMENT D WHERE E.DNO = D.DNUMBER
  • Clustering Index (1-level) ordering, non-key DNO with nC = 4 index-blocks
  • Employee: nE = 100 blocks; rE = 2000; Department: rD = 500; nD = 10 blocks
  • Memory: nB = 22 blocks, bfrE = 2 employees per block
  • Task: propose 3 strategies and choose the best.

Use Case I (Continued)

  • Clustering Index (1-level) ordering, non-key DNO with nC = 4 index-blocks
  • Employee: nE = 100 blocks; rE = 2000; Department: rD = 500; nD = 10 blocks
  • Memory: nB = 22 blocks, bfrE = 2 employees per block
  • Strategy 1: Use Clustering Index (DNO). For each department d, find details of its employees: e.DNO
    • Cost for accessing a department in Clustering Index: log_2(nC) = 2 block accesses
    • 2000 employees in 500 departments, i.e., m = 4 employees per department
    • 2 blocks per department (m / bfrE)
    • Cost-Clustering-Index = log_2(nC) + m / bfrE = 2 + 2 = 4 block accesses
    • Cost-1: nD + rD*(log_2(nC) + m / bfrE) = 2010 block accesses

Use Case I (Continued 2)

  • Clustering Index (1-level) ordering, non-key DNO with nC = 4 index-blocks
  • Employee: nE = 100 blocks; rE = 2000; Department: rD = 500; nD = 10 blocks
  • Memory: nB = 22 blocks, bfrE = 2 employees per block
  • Strategy 2: Department relation fits in memory; Hash-join (M = 10 buckets).
    • Hash Department and store into memory: nD = 10 block accesses.
    • Map each employee to a bucket in main memory and search within the bucket.
    • Partitioning phase cost: nD = 10 block accesses.
    • Probing phase cost: nE = 100 block accesses.
    • Cost-2: nD + nE = 110 block accesses

Use Case I (Continued 3)

  • Clustering Index (1-level) ordering, non-key DNO with nC = 4 index-blocks
  • Employee: nE = 100 blocks; rE = 2000; Department: rD = 500; nD = 10 blocks
  • Memory: nB = 22 blocks, bfrE = 2 employees per block
  • Strategy 3: Department relation is smaller than employee; nested-loop join
    • Outer relation: Department
    • Cost-3: nD + nE * ceil(nD/(nB-2)) = 10 + 100*(10/20) = 60 block accesses
  • Strategy 1 (index-based join): 2010 block accesses
  • Strategy 2 (hash-join): 110 block accesses
  • Strategy 3 (nested-loop join): 60 block accesses

Use Case II

  • SELECT E.NAME, S.NAME FROM EMPLOYEE E, EMPLOYEE S WHERE E.SUPER_SSN = S.SSN
  • Context:
    • b = 2,000 blocks;
    • r = 10,000 records (employees);
    • B+ Index over SSN of level x = 5
    • B+ Index over Super_SSN of level y = 2
    • 10% of employees are supervisors;
    • a supervisor does not have any supervisor: Super_SSN is NULL;
  • Task: Propose a plan that minimizes the expected cost using Index- based Nested-loop.

Use Case II (Continued)

  • SELECT E.NAME, S.NAME FROM EMPLOYEE E, EMPLOYEE S WHERE E.SUPER_SSN = S.SSN
  • Facts (Solution 1):
    • PK is SSN & FK is Super_SSN.
    • Use the B+ Index over SSN (ssn-index).
    • Scan relation Employee once, i.e., b = 2000 block accesses
    • For each block from Employee:
      • For each tuple e check if this employee is a supervisor, i.e., Super_SSN is NULL
      • IF employee e is NOT supervisor (w.p. 90%)
        • THEN use ssn-index(e.super_ssn)
      • ELSE go to next employee
    • Total Cost: b + 0.9r(x+1) = 56,000 block accesses

Use Case II (Continued 2)

  • SELECT E.NAME, S.NAME FROM EMPLOYEE E, EMPLOYEE S WHERE E.SUPER_SSN = S.SSN
  • Facts (Solution 2):
    • PK is SSN & FK is Super_SSN.
    • Use the B+ Index over Super_SSN (super-index).
    • Scan relation Employee once, i.e., b = 2000 block accesses
    • For each block:
      • For each tuple e check if this employee is a supervisor, i.e., Super_SSN is NULL
      • IF employee e is NOT supervisor (w.p. 90%)
        • THEN use super-index(e.super_ssn)
      • ELSE go to next employee
    • Total Cost: b + 0.9r(y+1) = 29,000 block accesses (48% faster)

Hash-Join: Cost (Sketch) OPTIONAL

  • Normal Case: The smallest relation cannot fit in memory.
  • Partitioning Phase
    • Read both relations E & D first (one block at a time);
    • Partial Cost: nE + nD
    • Partition into M buckets using with the same hashing function h,
    • The M main buckets fit in memory; overflow blocks in disk!
    • Store the hashed buckets of each relation to the disk.
    • Partial Cost: nE + nD
  • Probing Phase
    • For each m =1…M bucket Do
    • Read a pair: the m-th bucket from E and the m-th bucket from D
    • Partial Cost: nE + nD
    • Perform join focusing only on the tuples from the same bucket m
    • Idea: e might be matched with d since h(e) = h(d) = m
  • Expected Cost: 3(nE + nD)$$ block accesses.