Query Optimization, Plans, and Join Algorithms in Databases

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/24

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

25 Terms

New cards

Goal of query optimization

To find the lowest-cost execution plan for a SQL query.

Intuition: Given many ways to run the same query, choose the fastest.

Example: Join orders (A⋈B⋈C), indexes, join algorithms all affect cost.

Why it matters: Exam questions explicitly test cost calculations and join ordering.

New cards

Query plan

A tree-structured representation of relational operations (joins, selections, scans).

Intuition: A blueprint for how the DB will execute the query.

Example: Selection → join → projection.

Why it matters: Understanding plans is essential for cost analysis.

New cards

Logical vs Physical Plan

Logical plan: abstract relational operators. Physical plan: actual algorithms (hash join, index scan).

Intuition: "What to do" vs "how to do it."

Example: Logical: Join R, S. Physical: Hash join using R as build.

Why it matters: Exam may ask which algorithm fits a logical operator.

New cards

Cost-based optimization

Optimizer chooses the lowest estimated cost plan based on statistics.

Intuition: DB estimates how expensive each possible plan would be.

Example: Choose hash join over nested loops.

Why it matters: Real DBMSs use cost-based models; exam tests formulas.

New cards

Selectivity

Fraction of rows that satisfy a predicate.

Intuition: "How picky is the filter?"

Example: Gender='F' → selectivity ≈ 0.5.

Why it matters: Used in calculating output sizes of filters.

New cards

Estimate cardinality of a selection

cardinality = input_rows × selectivity

Intuition: If 10% of rows match → output = 10% of table.

Example: 100k rows × 0.2 = 20k result rows.

Why it matters: Foundational for join cost questions.

New cards

Independence assumption

Assumes predicates are independent: sel(p1 AND p2) = sel(p1) × sel(p2).

Intuition: "Probability they both happen = multiply."

Example: Age<30 (0.3) and salary>50k (0.4) → 0.12.

Why it matters: Common exam assumption for cardinality.

New cards

Uniformity assumption

Data values are uniformly distributed across attribute ranges.

Intuition: All values equally likely.

Example: Histogram bucket of size 10 has equal row distribution.

Why it matters: Used in deriving selectivity and range costs.

New cards

Histogram in query optimization

A summary of the distribution of values in a column.

Intuition: Compressed picture of how data is spread.

Example: Age histogram: 0-10, 11-20, etc.

Why it matters: Query optimizers use histograms to estimate selectivity.

New cards

Why do optimizers use histograms?

To estimate selectivity more accurately than simple assumptions.

Intuition: Not all values occur equally often.

Example: Salary distribution is skewed; histogram captures skew.

Why it matters: Better estimates → better plans.

New cards

Nested Loops Join (NLJ)

For each tuple in outer relation, scan inner relation.

Intuition: Double for-loop.

Example: For each student, check all enrollments.

Why it matters: Used when outer table is small.

New cards

Index Nested Loops Join (INLJ)

Outer relation scanned; use index on inner relation for matching rows.

Intuition: "Find inner tuples fast using index lookups."

Example: StudentID index on Enrollments enables INLJ.

Why it matters: Huge performance improvement; exam often tests cost formula.

New cards

Hash Join

Build hash table on smaller relation; probe using bigger one.

Intuition: Group into buckets and match quickly.

Example: CustomerID join when Customer table fits in memory.

Why it matters: Often best join; exam asks when to use.

New cards

Sort-Merge Join

Sort both relations, then merge them.

Intuition: Works like merging sorted lists.

Example: Join large tables on sorted attributes.

Why it matters: Good for range joins and sorted data.

New cards

Join tree

A binary tree showing order of joins.

Intuition: Which joins you do first.

Example: ((R⋈S)⋈T)⋈U.

Why it matters: Different join orders have VERY different costs.

New cards

Left-Deep Join Tree

A join tree where the inner input is always a base relation.

Intuition: Never join two large intermediate results.

Example: (((R⋈S)⋈T)⋈U) is left-deep.

Why it matters: Optimizers search left-deep trees because they're efficient and pipelinable.

New cards

Why do optimizers prefer left-deep trees?

They allow pipelining and index usage and reduce memory needs.

Intuition: Process one table at a time; avoid huge intermediate results.

Example: Using index nested loops for each join in left-deep tree.

Why it matters: Exam may ask: "Why left-deep instead of bushy?"

New cards

Cost of scanning a heap file

= number of pages in file.

Intuition: Must read every page.

Example: File has 100 pages → cost = 100 I/Os.

Why it matters: Used in selection + join cost formulas.

New cards

Cost of unclustered index scan returning many rows

= index height + (1 I/O per matching tuple)

Intuition: Pointers jump to scattered pages → expensive.

Example: 10k matches → 10k I/Os.

Why it matters: Critical for understanding why unclustered indexes suck for large outputs.

New cards

Cost of clustered index range scan

= index height + (# data pages in range)

Intuition: Data pages are sequential → cheap.

Example: Range spans 5 pages → only 5 I/Os after index.

Why it matters: Important contrast with unclustered.

New cards

Cost of Index Nested Loops Join (INLJ)

= cost(outer scan) + |outer| × cost(index lookup)

Intuition: Scan outer, indexed lookup on inner.

Example: 1000 outer rows × 2-I/O lookup = 2000 I/Os.

Why it matters: Direct exam formula.

New cards

Cost of Block Nested Loops Join (BNLJ)

≈ (pages of outer / buffer pages) × pages of inner

Intuition: Load big chunks into memory to reduce passes.

Example: Outer=100 pages, buffer=10 → 10 iterations scanning inner.

Why it matters: Exam often compares BNLJ vs INLJ.

New cards

Cost of Hash Join

≈ 3 × (pages of R + pages of S)

Intuition: 1 pass to partition, 1 pass to build/probe, overhead factor.

Example: R=10 pages, S=20 pages → ≈ 90 I/Os.

Why it matters: Hash join cost formula appears frequently.

New cards

Cardinality of join estimation

|R ⋈ S| = |R| × |S| / max(V(R,A), V(S,A))

Intuition: Join distribution based on distinctness.

Example: If both have A with 100 distinct values: |R⋈S| = |R|×|S| / 100.

Why it matters: Final exam ALWAYS includes join cardinality math.

New cards

What is a plan space?

The set of all possible join trees and operator choices for a query.

Intuition: The search space optimizer must explore.

Example: 3-way join → many possible orders.

Why it matters: Left-deep vs bushy plan space appears in theory questions.

Explore top notes

Sonar

Updated 324d ago

Note

Chapter 1 - Introduction to Child Development

Updated 396d ago

Note

Mid-Term Exam Review

Updated 200d ago

Note

Chapter 21: Earth's Changing Surface

Updated 935d ago

Note

Psychological Test Report

Updated 1113d ago

Note

european expansion and exploration

Updated 977d ago

Note

FRQ 3 Novel Analysis Hamlet

Updated 1234d ago

Note

AP Environmental Science: Unit 8 Review - Aquatic & Terrestrial Pollution

Updated 1636d ago

Note

Sonar

Updated 324d ago

Note

Chapter 1 - Introduction to Child Development

Updated 396d ago

Note

Mid-Term Exam Review

Updated 200d ago

Note

Chapter 21: Earth's Changing Surface

Updated 935d ago

Note

Psychological Test Report

Updated 1113d ago

Note

european expansion and exploration

Updated 977d ago

Note

FRQ 3 Novel Analysis Hamlet

Updated 1234d ago

Note

AP Environmental Science: Unit 8 Review - Aquatic & Terrestrial Pollution

Updated 1636d ago

Note

Explore top flashcards

Theology 730 midterm

Updated 363d ago

Flashcards (85)

Prostate Cancer- Heemer

Updated 325d ago

Flashcards (32)

Business Flashcards

Updated 298d ago

Flashcards (173)

Spectroscopy & Spectrometry

Updated 1042d ago

Flashcards (51)

TUT3 R10 Active Listening

Flashcards (29)

Flashcards (124)

Flashcards (101)

APUSH - important dates quiz 1

Updated 5d ago

Flashcards (25)

Theology 730 midterm

Updated 363d ago

Flashcards (85)

Prostate Cancer- Heemer

Updated 325d ago

Flashcards (32)

Business Flashcards

Updated 298d ago

Flashcards (173)

Spectroscopy & Spectrometry

Updated 1042d ago

Flashcards (51)

TUT3 R10 Active Listening

Flashcards (29)

Flashcards (124)

Flashcards (101)

APUSH - important dates quiz 1

Updated 5d ago

Flashcards (25)