Query Optimization, Plans, and Join Algorithms in Databases

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/24

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

25 Terms

1
New cards

Goal of query optimization

To find the lowest-cost execution plan for a SQL query.

Intuition: Given many ways to run the same query, choose the fastest.

Example: Join orders (A⋈B⋈C), indexes, join algorithms all affect cost.

Why it matters: Exam questions explicitly test cost calculations and join ordering.

2
New cards

Query plan

A tree-structured representation of relational operations (joins, selections, scans).

Intuition: A blueprint for how the DB will execute the query.

Example: Selection → join → projection.

Why it matters: Understanding plans is essential for cost analysis.

3
New cards

Logical vs Physical Plan

Logical plan: abstract relational operators. Physical plan: actual algorithms (hash join, index scan).

Intuition: "What to do" vs "how to do it."

Example: Logical: Join R, S. Physical: Hash join using R as build.

Why it matters: Exam may ask which algorithm fits a logical operator.

4
New cards

Cost-based optimization

Optimizer chooses the lowest estimated cost plan based on statistics.

Intuition: DB estimates how expensive each possible plan would be.

Example: Choose hash join over nested loops.

Why it matters: Real DBMSs use cost-based models; exam tests formulas.

5
New cards

Selectivity

Fraction of rows that satisfy a predicate.

Intuition: "How picky is the filter?"

Example: Gender='F' → selectivity ≈ 0.5.

Why it matters: Used in calculating output sizes of filters.

6
New cards

Estimate cardinality of a selection

cardinality = input_rows × selectivity

Intuition: If 10% of rows match → output = 10% of table.

Example: 100k rows × 0.2 = 20k result rows.

Why it matters: Foundational for join cost questions.

7
New cards

Independence assumption

Assumes predicates are independent: sel(p1 AND p2) = sel(p1) × sel(p2).

Intuition: "Probability they both happen = multiply."

Example: Age<30 (0.3) and salary>50k (0.4) → 0.12.

Why it matters: Common exam assumption for cardinality.

8
New cards

Uniformity assumption

Data values are uniformly distributed across attribute ranges.

Intuition: All values equally likely.

Example: Histogram bucket of size 10 has equal row distribution.

Why it matters: Used in deriving selectivity and range costs.

9
New cards

Histogram in query optimization

A summary of the distribution of values in a column.

Intuition: Compressed picture of how data is spread.

Example: Age histogram: 0-10, 11-20, etc.

Why it matters: Query optimizers use histograms to estimate selectivity.

10
New cards

Why do optimizers use histograms?

To estimate selectivity more accurately than simple assumptions.

Intuition: Not all values occur equally often.

Example: Salary distribution is skewed; histogram captures skew.

Why it matters: Better estimates → better plans.

11
New cards

Nested Loops Join (NLJ)

For each tuple in outer relation, scan inner relation.

Intuition: Double for-loop.

Example: For each student, check all enrollments.

Why it matters: Used when outer table is small.

12
New cards

Index Nested Loops Join (INLJ)

Outer relation scanned; use index on inner relation for matching rows.

Intuition: "Find inner tuples fast using index lookups."

Example: StudentID index on Enrollments enables INLJ.

Why it matters: Huge performance improvement; exam often tests cost formula.

13
New cards

Hash Join

Build hash table on smaller relation; probe using bigger one.

Intuition: Group into buckets and match quickly.

Example: CustomerID join when Customer table fits in memory.

Why it matters: Often best join; exam asks when to use.

14
New cards

Sort-Merge Join

Sort both relations, then merge them.

Intuition: Works like merging sorted lists.

Example: Join large tables on sorted attributes.

Why it matters: Good for range joins and sorted data.

15
New cards

Join tree

A binary tree showing order of joins.

Intuition: Which joins you do first.

Example: ((R⋈S)⋈T)⋈U.

Why it matters: Different join orders have VERY different costs.

16
New cards

Left-Deep Join Tree

A join tree where the inner input is always a base relation.

Intuition: Never join two large intermediate results.

Example: (((R⋈S)⋈T)⋈U) is left-deep.

Why it matters: Optimizers search left-deep trees because they're efficient and pipelinable.

17
New cards

Why do optimizers prefer left-deep trees?

They allow pipelining and index usage and reduce memory needs.

Intuition: Process one table at a time; avoid huge intermediate results.

Example: Using index nested loops for each join in left-deep tree.

Why it matters: Exam may ask: "Why left-deep instead of bushy?"

18
New cards

Cost of scanning a heap file

= number of pages in file.

Intuition: Must read every page.

Example: File has 100 pages → cost = 100 I/Os.

Why it matters: Used in selection + join cost formulas.

19
New cards

Cost of unclustered index scan returning many rows

= index height + (1 I/O per matching tuple)

Intuition: Pointers jump to scattered pages → expensive.

Example: 10k matches → 10k I/Os.

Why it matters: Critical for understanding why unclustered indexes suck for large outputs.

20
New cards

Cost of clustered index range scan

= index height + (# data pages in range)

Intuition: Data pages are sequential → cheap.

Example: Range spans 5 pages → only 5 I/Os after index.

Why it matters: Important contrast with unclustered.

21
New cards

Cost of Index Nested Loops Join (INLJ)

= cost(outer scan) + |outer| × cost(index lookup)

Intuition: Scan outer, indexed lookup on inner.

Example: 1000 outer rows × 2-I/O lookup = 2000 I/Os.

Why it matters: Direct exam formula.

22
New cards

Cost of Block Nested Loops Join (BNLJ)

≈ (pages of outer / buffer pages) × pages of inner

Intuition: Load big chunks into memory to reduce passes.

Example: Outer=100 pages, buffer=10 → 10 iterations scanning inner.

Why it matters: Exam often compares BNLJ vs INLJ.

23
New cards

Cost of Hash Join

≈ 3 × (pages of R + pages of S)

Intuition: 1 pass to partition, 1 pass to build/probe, overhead factor.

Example: R=10 pages, S=20 pages → ≈ 90 I/Os.

Why it matters: Hash join cost formula appears frequently.

24
New cards

Cardinality of join estimation

|R ⋈ S| = |R| × |S| / max(V(R,A), V(S,A))

Intuition: Join distribution based on distinctness.

Example: If both have A with 100 distinct values: |R⋈S| = |R|×|S| / 100.

Why it matters: Final exam ALWAYS includes join cardinality math.

25
New cards

What is a plan space?

The set of all possible join trees and operator choices for a query.

Intuition: The search space optimizer must explore.

Example: 3-way join → many possible orders.

Why it matters: Left-deep vs bushy plan space appears in theory questions.

Explore top flashcards