1/19
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Heap file
Unordered storage of records; new records go anywhere with free space.
Intuition: A "junk drawer" of data — fastest to insert, slowest to search.
Example: Storing student records without sorting or indexing.
Why it matters: Used when inserts are frequent; exam questions compare heap vs sorted vs hashed files.
Sorted file
Records stored in order of a chosen attribute
Intuition: Like alphabetized documents — great for range queries, slower for inserts.
Example: Employees sorted by salary or name.
Why it matters: Query optimization questions ask which storage minimizes I/O.
Hashing (for storage)
Records placed in buckets using a hash function on a key.
Intuition: A "jump table" letting you find something instantly by hashing its value.
Example: Hash on CustomerID → bucket #7.
Why it matters: Perfect for equality queries; exam tests when hashing is bad (e.g., range).
Clustered index
Index where the table's physical order follows the index order
Intuition: The data is the index — very efficient for ranges.
Example: Students sorted on GPA; B+ tree on GPA is clustered.
Why it matters: Huge cost differences in IO; exam expects you to know clustered > unclustered for ranges.
Unclustered index
Index stored separately; table is not sorted by index columns
Intuition: Like an index at the back of a textbook pointing to random scattered pages.
Example: Index on address when table is sorted by ID.
Why it matters: Results in many random disk I/Os; appears in cost questions.
B+ Tree Index
Balanced tree index with keys in internal nodes and data pointers in leaves.
Intuition: A perfectly balanced directory to quickly look up values or ranges.
Example: B+ tree on Salary supports queries like salary ≥ 50k.
Why it matters: Most common exam topic for indexing; used heavily in query optimization.
Primary (clustered) vs Secondary (unclustered) Index
Primary index = clustered; secondary index = unclustered.
Intuition: Primary: data is physically ordered. Secondary: pointers point all over.
Example: Primary on employeeID, secondary on lastName.
Why it matters: Secondary indexes have high cost for multi-record reads; exam tests formulas.
Cost of using a secondary B+ tree for selection
Cost = index height + (# matching records × 1 I/O per record).
Intuition: Leaves point to scattered disk blocks → expensive.
Example: 10,000 matches → 10,000 I/Os.
Why it matters: Shows why clustered indexes matter; used in cost optimization questions.
Index Nested Loops Join (INLJ)
Outer table scanned; for each row, index lookup on inner table
Intuition: Use an index to avoid scanning the full inner table.
Example: For each student, index lookup on enrollments.studentID.
Why it matters: Used for selective joins; must know cost formulas.
Hash Join
Build a hash table on smaller relation; probe using larger relation
Intuition: Fast equality join — like grouping items into buckets.
Example: Join customers with orders by customerID.
Why it matters: Often the best join unless memory is tight; exam tests when to use.
Sort-Merge Join
Sort both relations, then scan and merge
Intuition: Like merging two sorted lists.
Example: Sorted employees and sorted departments on deptID.
Why it matters: Great for range joins and large inputs; appears in join-planning questions.
Selectivity
Fraction of rows that satisfy a predicate.
Intuition: "How picky is the filter?"
Example: Gender = female → ~50% selectivity.
Why it matters: Optimizer uses selectivity to choose plans; exam expects you to compute it.
Cardinality Estimation
Predicting output size of operations
Intuition: Guesses how many rows will survive each filter/join.
Example: If 10% of rows match A, then 5% match B, estimated = 0.5%.
Why it matters: You cannot choose a correct plan without cardinalities.
Left-Deep Join Tree
A join tree where every join's outer input is a base relation
Intuition: Always join a table with the result-so-far (never two large subtrees).
Example: (((R ⋈ S) ⋈ T) ⋈ U)
Why it matters: Optimizers prefer it because it allows efficient pipelining (and exam questions require you to draw them).
Histogram (for selectivity estimation)
Summary of data distribution used by optimizer
Intuition: A "compressed picture" of the column values.
Example: Histogram for ages: 18-25, 26-40, etc.
Why it matters: Determines selectivity; exam might ask how estimators work.
Materialized View
A view whose results are physically stored and periodically updated
Intuition: A cached result table.
Example: Daily sales summary view.
Why it matters: Final exam includes incremental maintenance formulas.
Incremental View Maintenance Formula
For MV = R ⋈ S: V_new = V_old ∪ (ΔR ⋈ S_old) ∪ (R_old ⋈ ΔS) ∪ (ΔR ⋈ ΔS)
Intuition: Only recompute the small changes, not full join.
Example: If 3 new rows added → only join those 3.
Why it matters: This formula is directly tested on exams.
Why deletions are harder in materialized views
Deletions require removing specific matching tuples from V
Intuition: Easy to add; harder to "find and subtract.
Example: Delete R tuple → must compute ΔR ⋈ S and subtract.
Why it matters: Exam frequently asks why SUM needs COUNT, MAX needs B+ tree.
SUM aggregate maintenance
Update SUM by adding or subtracting the new value
Intuition: SUM changes in tiny increments.
Example: Old SUM(Revenue) = 100 → insert (20) → new = 120.
Why it matters: Core concept of incremental maintenance.
MAX aggregate maintenance
When deleting the max value, must find next-largest value
Intuition: Deleting current max may change the entire aggregate.
Example: MAX = 50; delete row with 50 → must search for next max.
Why it matters: Requires B+ tree; exam tests this specifically.