Query Processing Notes
Query Processing
Database Systems (H) - Dr Chris Anagnostopoulos
Roadmap
- External Sorting: Fundamental Operator
- Strategies for SELECT
- Simple, conjunctive, disjunctive
- Strategies for JOIN
- Five fundamental algorithms for JOIN
- Principle: Estimate the cost of each plan, choose the plan with the minimum expected cost, and then execute!
Fundamental Tool: Sorting
- Almost all SQL queries involve sorting of tuples with respect to ad-hoc sorting requests defined by the user.
CREATE PRIMARY INDEX ON EMPLOYEE(SSN)
means sort by SSN.ORDER BY Name
means sort by Name.SELECT DISTINCT Salary
means sort by Salary to create clusters and then identify the distinct values.SELECT DNO, COUNT (*) FROM EMPLOYEE GROUP BY DNO
means sort by DNO to create clusters.
- Fundamental Limitation: We cannot store the entire relation into memory for sorting the records (e.g., bubble sort, quick sort, heap sort, merge sort).
- External Sorting: Sorting algorithm for large relations stored on disk that do not fit entirely in main memory.
External Sorting: Overview
- Principle: Divide & Sort (Conquer)
- Divide: A file of b blocks into L smaller sub-files (b/L blocks each).
- Sort: Load each small sub-file to memory, sort using, e.g., quick-sort, bubble-sort and write it back to the disk.
- Merge: Merge two (or more) sorted sub-files loaded from disk in memory creating bigger sorted sub-files, that are merged in turn.
External Sorting: Overview (Lemma 1)
- Lemma 1: The expected cost of the sort-merge strategy in block accesses is: 2b.(1 + log_M(L)). Where:
- b is the number of file blocks.
- M: degree of merging, i.e., number of sorted blocks merged in each loop.
- L: number of the initial sorted sub-files (before entering merging phase).
- Proof: (See Knuth, Donald (1998). "Chapter 5.4.1. Multiway Merging and Replacement Selection". Sorting and Searching. The Art of Computer Programming. 3 (2nd ed.). Addison-Wesley. pp. 158–168.)
- M = 2 gives the worst-case performance because it merges in parallel only a pair of blocks at each step.
- M > 2: merge more than two blocks at each step; (M-way merging).
Strategies for SELECT
- Given:
SELECT * FROM relation WHERE selection-conditions
- S1: Linear search over a key
- Retrieve every record; test whether it satisfies the selection condition.
SELECT * FROM EMPLOYEE WHERE SSN = 1234
- Expected Cost: b/2
- S2: Binary Search over a key
SELECT * FROM EMPLOYEE WHERE SSN = 1234
- Expected Cost (unsorted file): log2(b) + 2b(1 + logM(L))
- Expected Cost (sorted file): log_2(b)
- S3: Primary Index or Hash Function over a key
SELECT * FROM EMPLOYEE WHERE SSN = 1234
- Precondition (Index): Primary Index of level t over key (sorted by key)
- Precondition (Hash): File hashed with the key
- Expected Cost (sorted file): t + 1
- Expected Cost (hashed file): 1 + n/2 (n = #overflow blocks < b)
- S4: Hash Function over a non-key
SELECT * FROM EMPLOYEE WHERE DNO = 10
- Precondition (Hash): File hashed with the non-key
- Expected Cost: 1 + n (n = #overflow blocks having the non-key)
- S5: Primary Index over a key in a range query: $>>, \geq, <, \leq
- Use Index to find the record satisfying the equality (e.g., SSN = 9) and retrieve all subsequent blocks from the ordered file.
SELECT * FROM EMPLOYEE WHERE SSN > 8;
- Precondition: Primary Index of level t over the sorted by key
- Expected Cost (sorted file): t + O(b)
- Estimated Cost (sorted file): t + P(X > x) . b = t + b . (1– x/r)
- P(X = x): probability of SSN = x = 1/r
- r = number of employees = b . f
- f = employees/block
- P(X > x) = 1 – P(X \leq x) = 1 - (1/r + … + 1/r) = 1 – x/r
- S6: Clustering Index over ordering, non-key
- Retrieve all contiguous blocks of the cluster.
SELECT * FROM EMPLOYEE WHERE DNO = 5;
- Precondition: Clustering Index of level t on non-key (file sorted by non-key)
- Expected cost (sorted file): t + O(b/n)
- Estimation: t + b/n or t + P(X = x) . b
- Note 1: n = #distinct values of the non-key attribute
- Note 2: Attribute is uniformly distributed, i.e., P(X = x) = 1/n
- S7: Secondary Index (B+ Tree) over non-ordering key
SELECT * FROM DEPARTMENT WHERE MGR_SSN = 1234
- Precondition: File is not ordered by key.
- Expected Cost: t + 1
- B+ Leaf Node points at the unique block
- S8: Secondary Index (B+ Tree) over non-ordering, non-key
- Note: retrieve multiple records from different blocks having the same value.
SELECT * FROM EMPLOYEE WHERE SALARY = 40000;
- Precondition: File is not ordered by non-key
- Expected Cost: t + m + O(b)
- Note 1: B+ Leaf Node points to the first block of a cluster of blocks storing pointers to data blocks with Salary = 40K (2 levels of indirection)
- Note 2: m is number of blocks with block pointers (m \geq 1)
Strategies for Disjunctive SELECT
- Disjunctive Selections: conditions involving OR
SELECT * FROM EMPLOYEE WHERE SALARY > 10000 OR NAME LIKE ‘%Chris%’
- Final result: contains tuples satisfying the union of all selection conditions
- Methodology:
- IF an access path exists, e.g., B+/hash/primary-index for all of the attributes:
- use each to retrieve the set of records satisfying each condition
- union all sets to get the final result.
- ELSE if none or some of the attributes have an access path, linear search is unavoidable!
- IF an access path exists, e.g., B+/hash/primary-index for all of the attributes:
Strategies for Conjunctive SELECT
- Conjunctive Selections: conditions involving AND
SELECT * FROM EMPLOYEE WHERE SALARY > 40000 AND NAME LIKE ‘%Chris%’
- Methodology:
- IF an access path exists (index) for an attribute, use it to retrieve the tuples satisfying the condition, e.g., Salary > 40000 [intermediate result]
- GO through this intermediate result to check which record satisfies also the other condition(s), e.g., Name LIKE ‘%Chris%’ in memory!
- If you have two indexes, which index is to be used first?
- Answer: use the index that generates the smallest intermediate result set hoping to fit in the memory! [selectivity = #tuples retrieved]
- Optimization: find the execution sequence of conditions that minimizes the expected cost.
- Principle: Predict the selectivity beforehand!
Strategies for JOIN
- Observation: the most resource-consuming operator!
- Focus: two-way equijoin, i.e., join two relations with equality ‘=’
SELECT * FROM EMPLOYEE E, DEPARTMENT D WHERE E.DNO = D.DNUMBER
- Five fundamental strategies for join processing:
- Naïve join (no access path)
- Nested-loop join (no access path)
- Index-based nested-loop join (index; B+ Trees)
- Merge-join (sorted relations)
- Hash-join (hashed relations)
Naïve Join
SELECT * FROM R, S WHERE R.A = S.B
--A and B are join attributes, e.g., PK, FK.- Step 1: Compute the Cartesian product of R and S, i.e., all tuples from R are concatenated (combined) with all tuples from S.
- Step 2: Store the result in a file T and for each concatenated tuple t = (r, s) with r \in R and s \in S check if r.A = s.B
- Algorithm Naïve Join
- T = Cartesian R x S
- Scan T, a tuple t \in T at a time: t = (r, s)
- If r.A = s.B then add (r, s) to the result file
- Go to next tuple t \in T.
- Outcome: inefficient, typically the result is a small subset of the Cartesian!
- What-If: no tuples are actually matched; predict the matching tuples in advance!
Nested-Loop Join
- Algorithm Nested-Loop Join
- For each tuple r \in R //outer relation
- For each tuple s \in S //inner relation
- If r.A = s.B then add (r, s) to the result file;
- Note: the outer & inner loops are over blocks and not over tuples!
- Note: Re-form the pseudocode in a block-centric programming mode (system programming using files)
- Challenge 1: Which relation should be in the outer loop and which in the inner loop to minimize the join processing cost?
Nested-Loop Join: Algorithm
- Step 1:
- LOAD a set (chunk) of blocks from the outer relation R.
- LOAD one block from inner relation S
- Maintain an output buffer for the matching (resulting) tuples (r, s): r.A = s.B
- Step 2:
- JOIN the S block with each R block from the chunk
- FOR each matching tuple r \in R-block and s \in S-block ADD (r, s) to Output Buffer
- IF Outer Buffer is full, PAUSE; WRITE the current join result to disk; CONTINUE
- Step 3: LOAD next S-block and GOTO Step 2
- Step 4: GOTO Step 1
Nested-Loop Join: Cost
SELECT * FROM EMPLOYEE E, DEPARTMENT D WHERE E.DNO = D.DNUMBER
- Employee (E): nE blocks used at the outer loop
- Department (D): nD blocks used at the inner loop
- Memory: nB blocks available:
- 1 block for reading the inner file D,
- 1 block for writing the join result,
- nB-2 blocks for reading the outer file E: chunk size.
- Observation 1: Each block of the outer relation E is read once.
- Observation 2: The whole inner relation D is read every time we read a chunk of (nB-2) blocks of E.
Nested-Loop Join: Cost (Continued)
- Total number of blocks read for outer relation E: nE
- Outer Loops: Number of chunks of (nB-2) blocks of outer relation: ceil(nE/(nB-2))
- For each chunk of (nB-2) blocks read all the blocks of inner relation D:
- Total number of block read in all outer loops: nD*ceil(nE/(nB-2))
- Total Expected Cost: nE + nD * ceil(nE/(nB-2)) block accesses
- Example: nE = 2,000 blocks; nD = 10 blocks; nB = 7 blocks
- Strategy Cost 1: (E outer; D inner) nE + nD * ceil(nE/(nB-2)) = 6,000 block accesses
- Strategy Cost 2: (D outer; E inner) nD + nE * ceil(nD/(nB-2)) = 4,010 block accesses
- Note: The file with fewer blocks goes to the outer loop.
Index-Based Nested-Loop Join
- Idea: Use of an index on either A or B joining attributes: R.A = S.B.
- Focus: Assume an index I on attribute B of relation S.
- Algorithm Index-Based Nested-Loop Join
- For each tuple r \in R
- Use index of B: I(r.A), to retrieve all tuples s \in S having s.B = r.A
- For each such tuple s \in S, add matching tuple (r, s) to the result file;
- Claim: Much faster compared to the nested-loop join, why?
- Because: We get immediate access on s \in S with s.B = r.A by searching for r.A using the index I, avoiding linear search on S.
- Challenge 2: Which index to use to minimize the join processing cost?
Index-Based Nested-Loop: Cost
SELECT * FROM EMPLOYEE E, DEPARTMENT D WHERE D.MGR_SSN = E.SSN
- B+ Tree on Mgr_Ssn with level xD = 2
- B+ Tree on SSN with level xE = 4
- E: rE = 6000 tuples; nE = 2,000 blocks; D: rD = 50 tuples; nD = 10 blocks
- Strategy 1: Employee e, use B+ Tree (MgrSsn) to find department d: e.Ssn = d.MgrSsn.
- Observation: not all employees are managers; –search without meaning sometimes…
- Probability an employee being manager: 50/6000 = 0.83% (99.16% meaningless searches)
- Strategy Cost 1: nE + rE*(xD + 1) = 20,000 block accesses;
- Strategy 2: Department d, use B+ Tree (SSN) to find Employee e: e.Ssn = d.Mgr_Ssn
- Observation: every department has one manager –search is fruitful…
- Probability a manager being an employee is 100% ☺
- Strategy Cost 2: nD + rD*(xE + 1) = 260 block accesses;
- Huge difference (20,000 vs 260 block accesses):
- every record in Department is joined with exactly one record in Employee (manager)
- only some employees from Employee are managers of departments…
- Lesson Learnt: Use the index built on the PK
- Note: not for recursive relationships, e.g., employee-supervisor
Sort-Merge Join
- Idea: Use of the merge-sort algorithm over two sorted relations w.r.t. their joining attributes.
- Pre-condition: Relations R and S are physically ordered on their joining A and B;
- Methodology:
- Step 1: Load a pair {R.block, S.block} of sorted blocks into the memory;
- Step 2: Both blocks are linearly scanned concurrently over the joining attributes (sort-merge algorithm in memory);
- Step 3: If matching tuples found then store them in a buffer.
- Gain: The blocks of each file are scanned only once!
- But: If R and S are not a-priori physically ordered on A and B then sort them first!
Sort-Merge Join: Example
R
- A sname rating age
- 22 dustin 7 45.0
- 28 yuppy 9 35.0
- 44 guppy 5 35.0
- 58 rusty 10 35.0
S
- B bid day rname
- 28 103 12/4/96 guppy
- 28 103 11/3/96 yuppy
- 31 101 10/10/96 dustin
- 31 102 10/12/96 lubber
- 31 101 10/11/96 lubber
- 58 103 11/12/96 dustin
Result Buffer
- (28,yuppy,9,35,103,12/4/96,guppy)
- (28,yuppy,9,35,103,11/3/96,yuppy)
- (58,rusty,10,35,103,11/12/96,dustin)
Sort-Merge-Join: Cost
- Requirement: Efficient if both Employee E and Department D are already sorted by their joining attributes: SSN and Mgr_Ssn.
- Observation: only a single pass is made for each file.
- Strategy Cost: nE + nD = 2,010 block accesses.
- IF both files are not sorted THEN use external sorting!
- Strategy Cost 1: External sorting (2-way merge): 2.nE + 2. nE .log_2(ceil(nE / nB))
- ceil(nE / nB): number of initial sorted sub-files (each sub-file is nB blocks)
- nB: number of available memory blocks.
- Strategy Cost 2: External sorting (2-way merge): 2.nD + 2. nD .log_2(ceil(nD / nB))
- Strategy Cost 1: External sorting (2-way merge): 2.nE + 2. nE .log_2(ceil(nE / nB))
- Total Strategy Cost:
- nE + nD + 2.nE + 2. nE .log2(ceil(nE / nB)) + 2.nD + 2. nD .log2(ceil(nD / nB))
- Example: nE = 2,000 blocks; nD = 10 blocks; nB = 7 blocks, we get:
- 2010 + 4000 + 4000 log(286) + 20 + 20 log(2) = 38,690 block accesses; only 5.1% is devoted to join!
- Think before sort only for joining purposes!
Hash-Join
- Pre-condition:
- File R is partitioned into M buckets w.r.t. hash function h over joining attribute A.
- File S is also partitioned into M buckets w.r.t. the same hash function h over attribute B;
- Assumption: R is the smallest file and fits into main memory: M buckets of R are in memory.
- Algorithm Hash-Join
- Partitioning phase
- For each tuple r \in R,
- Compute y = h(r.A) /* address of bucket*/
- Place tuple r into bucket y = h(r.A) in memory
- Probing phase
- For each tuple s \in S,
- Compute y = h(s.B) /use the same hash function h/
- Find the bucket y = h(s.B) in memory (of the R partition).
- For each tuple r \in R in the bucket y = h(s.B)
- If s.B = r.A add (r, s) to the result file; /join/
- Partitioning phase
Hash-Join (Partitioning and Probing phases)
Partitioning Phase
- Partition of R over attribute A using hash h(A) = A \mod M into M buckets.
- M+1 main memory buffers
Probing Phase
- Hashing each tuple s from S, using hash h(s.B) = s.B \mod M to identify the y = h(s.B) bucket in memory.
Hash-Join: Cost
- Best Case: Memory nB > nD + 2
- nD: blocks for the smallest of the two relations (e.g., Department)
- Whole relation Department fits in memory and is hashed into M buckets.
- Each Employee tuple is loaded and hashed on joining attribute SSN.
- The corresponding bucket is found and searched for a matching tuple.
- The result is stored in another buffer (that’s why nB > nD + 2)
- Best Case: nE + nD block accesses.
- Normal Case (smallest relation cannot fit in memory): 3(nE + nD) block accesses
Put-All-Together: Join Cost Prediction
- Naïve Join Cost: nE * nD: 20,000 block accesses
- Nested-Loop Cost (best): nD + nE * ceil(nD/(nB-2)): 4,010 block accesses
- Index-based Nested-Loop Cost (best): nD + rD*(xE + 1): 260 block accesses
- Sort-Merge Cost (already sorted): nE + nD: 2,010 block accesses
- Hash-Join Normal-Case Cost: 3(nE + nD): 6,030 block accesses
- Hold on a second: the cost for writing the result-set buffer (block) from memory to disk in each strategy is not yet considered!
- How many blocks are written? How many tuples are matched?…next weeks
So Far…
- Naïve Join: Exploit nothing. Cartesian product and then check…
- Nested-Loop: Exploit nothing. Computing-oriented join.
- Which relation should be in the outer loop? Influences the join cost.
- Can you predict the cost then? Optimization…
- Index-based Nested-Loop: Exploit at least one index. Use index to find the matched tuples as quick as possible ☺
- If we have two indexes (over R.A and over S.B), which one to use? Influences the join cost! Optimization…
- Merge-Join: Exploit both ordered relations; otherwise; sort them
- Hash-Join: Exploit hashing. Hash one relation.
- Use the same hashing function to find the matching tuples in the same bucket ☺
Use Case I
SELECT D.NAME, E.NAME FROM EMPLOYEE E, DEPARTMENT D WHERE E.DNO = D.DNUMBER
- Clustering Index (1-level) ordering, non-key DNO with nC = 4 index-blocks
- Employee: nE = 100 blocks; rE = 2000; Department: rD = 500; nD = 10 blocks
- Memory: nB = 22 blocks, bfrE = 2 employees per block
- Task: propose 3 strategies and choose the best.
Use Case I (Continued)
- Clustering Index (1-level) ordering, non-key DNO with nC = 4 index-blocks
- Employee: nE = 100 blocks; rE = 2000; Department: rD = 500; nD = 10 blocks
- Memory: nB = 22 blocks, bfrE = 2 employees per block
- Strategy 1: Use Clustering Index (DNO). For each department d, find details of its employees: e.DNO
- Cost for accessing a department in Clustering Index: log_2(nC) = 2 block accesses
- 2000 employees in 500 departments, i.e., m = 4 employees per department
- 2 blocks per department (m / bfrE)
- Cost-Clustering-Index = log_2(nC) + m / bfrE = 2 + 2 = 4 block accesses
- Cost-1: nD + rD*(log_2(nC) + m / bfrE) = 2010 block accesses
Use Case I (Continued 2)
- Clustering Index (1-level) ordering, non-key DNO with nC = 4 index-blocks
- Employee: nE = 100 blocks; rE = 2000; Department: rD = 500; nD = 10 blocks
- Memory: nB = 22 blocks, bfrE = 2 employees per block
- Strategy 2: Department relation fits in memory; Hash-join (M = 10 buckets).
- Hash Department and store into memory: nD = 10 block accesses.
- Map each employee to a bucket in main memory and search within the bucket.
- Partitioning phase cost: nD = 10 block accesses.
- Probing phase cost: nE = 100 block accesses.
- Cost-2: nD + nE = 110 block accesses
Use Case I (Continued 3)
- Clustering Index (1-level) ordering, non-key DNO with nC = 4 index-blocks
- Employee: nE = 100 blocks; rE = 2000; Department: rD = 500; nD = 10 blocks
- Memory: nB = 22 blocks, bfrE = 2 employees per block
- Strategy 3: Department relation is smaller than employee; nested-loop join
- Outer relation: Department
- Cost-3: nD + nE * ceil(nD/(nB-2)) = 10 + 100*(10/20) = 60 block accesses
- Strategy 1 (index-based join): 2010 block accesses
- Strategy 2 (hash-join): 110 block accesses
- Strategy 3 (nested-loop join): 60 block accesses
Use Case II
SELECT E.NAME, S.NAME FROM EMPLOYEE E, EMPLOYEE S WHERE E.SUPER_SSN = S.SSN
- Context:
- b = 2,000 blocks;
- r = 10,000 records (employees);
- B+ Index over SSN of level x = 5
- B+ Index over Super_SSN of level y = 2
- 10% of employees are supervisors;
- a supervisor does not have any supervisor: Super_SSN is NULL;
- Task: Propose a plan that minimizes the expected cost using Index- based Nested-loop.
Use Case II (Continued)
SELECT E.NAME, S.NAME FROM EMPLOYEE E, EMPLOYEE S WHERE E.SUPER_SSN = S.SSN
- Facts (Solution 1):
- PK is SSN & FK is Super_SSN.
- Use the B+ Index over SSN (ssn-index).
- Scan relation Employee once, i.e., b = 2000 block accesses
- For each block from Employee:
- For each tuple e check if this employee is a supervisor, i.e., Super_SSN is NULL
- IF employee e is NOT supervisor (w.p. 90%)
- THEN use ssn-index(e.super_ssn)
- ELSE go to next employee
- Total Cost: b + 0.9r(x+1) = 56,000 block accesses
Use Case II (Continued 2)
SELECT E.NAME, S.NAME FROM EMPLOYEE E, EMPLOYEE S WHERE E.SUPER_SSN = S.SSN
- Facts (Solution 2):
- PK is SSN & FK is Super_SSN.
- Use the B+ Index over Super_SSN (super-index).
- Scan relation Employee once, i.e., b = 2000 block accesses
- For each block:
- For each tuple e check if this employee is a supervisor, i.e., Super_SSN is NULL
- IF employee e is NOT supervisor (w.p. 90%)
- THEN use super-index(e.super_ssn)
- ELSE go to next employee
- Total Cost: b + 0.9r(y+1) = 29,000 block accesses (48% faster)
Hash-Join: Cost (Sketch) OPTIONAL
- Normal Case: The smallest relation cannot fit in memory.
- Partitioning Phase
- Read both relations E & D first (one block at a time);
- Partial Cost: nE + nD
- Partition into M buckets using with the same hashing function h,
- The M main buckets fit in memory; overflow blocks in disk!
- Store the hashed buckets of each relation to the disk.
- Partial Cost: nE + nD
- Probing Phase
- For each m =1…M bucket Do
- Read a pair: the m-th bucket from E and the m-th bucket from D
- Partial Cost: nE + nD
- Perform join focusing only on the tuples from the same bucket m
- Idea: e might be matched with d since h(e) = h(d) = m
- Expected Cost: 3(nE + nD)$$ block accesses.