1/24
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Goal of query optimization
To find the lowest-cost execution plan for a SQL query.
Intuition: Given many ways to run the same query, choose the fastest.
Example: Join orders (A⋈B⋈C), indexes, join algorithms all affect cost.
Why it matters: Exam questions explicitly test cost calculations and join ordering.
Query plan
A tree-structured representation of relational operations (joins, selections, scans).
Intuition: A blueprint for how the DB will execute the query.
Example: Selection → join → projection.
Why it matters: Understanding plans is essential for cost analysis.
Logical vs Physical Plan
Logical plan: abstract relational operators. Physical plan: actual algorithms (hash join, index scan).
Intuition: "What to do" vs "how to do it."
Example: Logical: Join R, S. Physical: Hash join using R as build.
Why it matters: Exam may ask which algorithm fits a logical operator.
Cost-based optimization
Optimizer chooses the lowest estimated cost plan based on statistics.
Intuition: DB estimates how expensive each possible plan would be.
Example: Choose hash join over nested loops.
Why it matters: Real DBMSs use cost-based models; exam tests formulas.
Selectivity
Fraction of rows that satisfy a predicate.
Intuition: "How picky is the filter?"
Example: Gender='F' → selectivity ≈ 0.5.
Why it matters: Used in calculating output sizes of filters.
Estimate cardinality of a selection
cardinality = input_rows × selectivity
Intuition: If 10% of rows match → output = 10% of table.
Example: 100k rows × 0.2 = 20k result rows.
Why it matters: Foundational for join cost questions.
Independence assumption
Assumes predicates are independent: sel(p1 AND p2) = sel(p1) × sel(p2).
Intuition: "Probability they both happen = multiply."
Example: Age<30 (0.3) and salary>50k (0.4) → 0.12.
Why it matters: Common exam assumption for cardinality.
Uniformity assumption
Data values are uniformly distributed across attribute ranges.
Intuition: All values equally likely.
Example: Histogram bucket of size 10 has equal row distribution.
Why it matters: Used in deriving selectivity and range costs.
Histogram in query optimization
A summary of the distribution of values in a column.
Intuition: Compressed picture of how data is spread.
Example: Age histogram: 0-10, 11-20, etc.
Why it matters: Query optimizers use histograms to estimate selectivity.
Why do optimizers use histograms?
To estimate selectivity more accurately than simple assumptions.
Intuition: Not all values occur equally often.
Example: Salary distribution is skewed; histogram captures skew.
Why it matters: Better estimates → better plans.
Nested Loops Join (NLJ)
For each tuple in outer relation, scan inner relation.
Intuition: Double for-loop.
Example: For each student, check all enrollments.
Why it matters: Used when outer table is small.
Index Nested Loops Join (INLJ)
Outer relation scanned; use index on inner relation for matching rows.
Intuition: "Find inner tuples fast using index lookups."
Example: StudentID index on Enrollments enables INLJ.
Why it matters: Huge performance improvement; exam often tests cost formula.
Hash Join
Build hash table on smaller relation; probe using bigger one.
Intuition: Group into buckets and match quickly.
Example: CustomerID join when Customer table fits in memory.
Why it matters: Often best join; exam asks when to use.
Sort-Merge Join
Sort both relations, then merge them.
Intuition: Works like merging sorted lists.
Example: Join large tables on sorted attributes.
Why it matters: Good for range joins and sorted data.
Join tree
A binary tree showing order of joins.
Intuition: Which joins you do first.
Example: ((R⋈S)⋈T)⋈U.
Why it matters: Different join orders have VERY different costs.
Left-Deep Join Tree
A join tree where the inner input is always a base relation.
Intuition: Never join two large intermediate results.
Example: (((R⋈S)⋈T)⋈U) is left-deep.
Why it matters: Optimizers search left-deep trees because they're efficient and pipelinable.
Why do optimizers prefer left-deep trees?
They allow pipelining and index usage and reduce memory needs.
Intuition: Process one table at a time; avoid huge intermediate results.
Example: Using index nested loops for each join in left-deep tree.
Why it matters: Exam may ask: "Why left-deep instead of bushy?"
Cost of scanning a heap file
= number of pages in file.
Intuition: Must read every page.
Example: File has 100 pages → cost = 100 I/Os.
Why it matters: Used in selection + join cost formulas.
Cost of unclustered index scan returning many rows
= index height + (1 I/O per matching tuple)
Intuition: Pointers jump to scattered pages → expensive.
Example: 10k matches → 10k I/Os.
Why it matters: Critical for understanding why unclustered indexes suck for large outputs.
Cost of clustered index range scan
= index height + (# data pages in range)
Intuition: Data pages are sequential → cheap.
Example: Range spans 5 pages → only 5 I/Os after index.
Why it matters: Important contrast with unclustered.
Cost of Index Nested Loops Join (INLJ)
= cost(outer scan) + |outer| × cost(index lookup)
Intuition: Scan outer, indexed lookup on inner.
Example: 1000 outer rows × 2-I/O lookup = 2000 I/Os.
Why it matters: Direct exam formula.
Cost of Block Nested Loops Join (BNLJ)
≈ (pages of outer / buffer pages) × pages of inner
Intuition: Load big chunks into memory to reduce passes.
Example: Outer=100 pages, buffer=10 → 10 iterations scanning inner.
Why it matters: Exam often compares BNLJ vs INLJ.
Cost of Hash Join
≈ 3 × (pages of R + pages of S)
Intuition: 1 pass to partition, 1 pass to build/probe, overhead factor.
Example: R=10 pages, S=20 pages → ≈ 90 I/Os.
Why it matters: Hash join cost formula appears frequently.
Cardinality of join estimation
|R ⋈ S| = |R| × |S| / max(V(R,A), V(S,A))
Intuition: Join distribution based on distinctness.
Example: If both have A with 100 distinct values: |R⋈S| = |R|×|S| / 100.
Why it matters: Final exam ALWAYS includes join cardinality math.
What is a plan space?
The set of all possible join trees and operator choices for a query.
Intuition: The search space optimizer must explore.
Example: 3-way join → many possible orders.
Why it matters: Left-deep vs bushy plan space appears in theory questions.