Database Storage, Indexing, and Materialized View Concepts for Advanced Data Management

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/19

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

20 Terms

1
New cards

Heap file

Unordered storage of records; new records go anywhere with free space.

Intuition: A "junk drawer" of data — fastest to insert, slowest to search.

Example: Storing student records without sorting or indexing.

Why it matters: Used when inserts are frequent; exam questions compare heap vs sorted vs hashed files.

2
New cards

Sorted file

Records stored in order of a chosen attribute

Intuition: Like alphabetized documents — great for range queries, slower for inserts.

Example: Employees sorted by salary or name.

Why it matters: Query optimization questions ask which storage minimizes I/O.

3
New cards

Hashing (for storage)

Records placed in buckets using a hash function on a key.

Intuition: A "jump table" letting you find something instantly by hashing its value.

Example: Hash on CustomerID → bucket #7.

Why it matters: Perfect for equality queries; exam tests when hashing is bad (e.g., range).

4
New cards

Clustered index

Index where the table's physical order follows the index order

Intuition: The data is the index — very efficient for ranges.

Example: Students sorted on GPA; B+ tree on GPA is clustered.

Why it matters: Huge cost differences in IO; exam expects you to know clustered > unclustered for ranges.

5
New cards

Unclustered index

Index stored separately; table is not sorted by index columns

Intuition: Like an index at the back of a textbook pointing to random scattered pages.

Example: Index on address when table is sorted by ID.

Why it matters: Results in many random disk I/Os; appears in cost questions.

6
New cards

B+ Tree Index

Balanced tree index with keys in internal nodes and data pointers in leaves.

Intuition: A perfectly balanced directory to quickly look up values or ranges.

Example: B+ tree on Salary supports queries like salary ≥ 50k.

Why it matters: Most common exam topic for indexing; used heavily in query optimization.

7
New cards

Primary (clustered) vs Secondary (unclustered) Index

Primary index = clustered; secondary index = unclustered.

Intuition: Primary: data is physically ordered. Secondary: pointers point all over.

Example: Primary on employeeID, secondary on lastName.

Why it matters: Secondary indexes have high cost for multi-record reads; exam tests formulas.

8
New cards

Cost of using a secondary B+ tree for selection

Cost = index height + (# matching records × 1 I/O per record).

Intuition: Leaves point to scattered disk blocks → expensive.

Example: 10,000 matches → 10,000 I/Os.

Why it matters: Shows why clustered indexes matter; used in cost optimization questions.

9
New cards

Index Nested Loops Join (INLJ)

Outer table scanned; for each row, index lookup on inner table

Intuition: Use an index to avoid scanning the full inner table.

Example: For each student, index lookup on enrollments.studentID.

Why it matters: Used for selective joins; must know cost formulas.

10
New cards

Hash Join

Build a hash table on smaller relation; probe using larger relation

Intuition: Fast equality join — like grouping items into buckets.

Example: Join customers with orders by customerID.

Why it matters: Often the best join unless memory is tight; exam tests when to use.

11
New cards

Sort-Merge Join

Sort both relations, then scan and merge

Intuition: Like merging two sorted lists.

Example: Sorted employees and sorted departments on deptID.

Why it matters: Great for range joins and large inputs; appears in join-planning questions.

12
New cards

Selectivity

Fraction of rows that satisfy a predicate.

Intuition: "How picky is the filter?"

Example: Gender = female → ~50% selectivity.

Why it matters: Optimizer uses selectivity to choose plans; exam expects you to compute it.

13
New cards

Cardinality Estimation

Predicting output size of operations

Intuition: Guesses how many rows will survive each filter/join.

Example: If 10% of rows match A, then 5% match B, estimated = 0.5%.

Why it matters: You cannot choose a correct plan without cardinalities.

14
New cards

Left-Deep Join Tree

A join tree where every join's outer input is a base relation

Intuition: Always join a table with the result-so-far (never two large subtrees).

Example: (((R ⋈ S) ⋈ T) ⋈ U)

Why it matters: Optimizers prefer it because it allows efficient pipelining (and exam questions require you to draw them).

15
New cards

Histogram (for selectivity estimation)

Summary of data distribution used by optimizer

Intuition: A "compressed picture" of the column values.

Example: Histogram for ages: 18-25, 26-40, etc.

Why it matters: Determines selectivity; exam might ask how estimators work.

16
New cards

Materialized View

A view whose results are physically stored and periodically updated

Intuition: A cached result table.

Example: Daily sales summary view.

Why it matters: Final exam includes incremental maintenance formulas.

17
New cards

Incremental View Maintenance Formula

For MV = R ⋈ S: V_new = V_old ∪ (ΔR ⋈ S_old) ∪ (R_old ⋈ ΔS) ∪ (ΔR ⋈ ΔS)

Intuition: Only recompute the small changes, not full join.

Example: If 3 new rows added → only join those 3.

Why it matters: This formula is directly tested on exams.

18
New cards

Why deletions are harder in materialized views

Deletions require removing specific matching tuples from V

Intuition: Easy to add; harder to "find and subtract.

Example: Delete R tuple → must compute ΔR ⋈ S and subtract.

Why it matters: Exam frequently asks why SUM needs COUNT, MAX needs B+ tree.

19
New cards

SUM aggregate maintenance

Update SUM by adding or subtracting the new value

Intuition: SUM changes in tiny increments.

Example: Old SUM(Revenue) = 100 → insert (20) → new = 120.

Why it matters: Core concept of incremental maintenance.

20
New cards

MAX aggregate maintenance

When deleting the max value, must find next-largest value

Intuition: Deleting current max may change the entire aggregate.

Example: MAX = 50; delete row with 50 → must search for next max.

Why it matters: Requires B+ tree; exam tests this specifically.

Explore top flashcards

Nisäkkäät
Updated 773d ago
flashcards Flashcards (47)
31-35
Updated 79d ago
flashcards Flashcards (69)
BIOL 375 Exam 2
Updated 1026d ago
flashcards Flashcards (76)
MB3
Updated 191d ago
flashcards Flashcards (37)
Tema 6: Contexto 2
Updated 970d ago
flashcards Flashcards (30)
Emotions and moods
Updated 187d ago
flashcards Flashcards (114)
Nisäkkäät
Updated 773d ago
flashcards Flashcards (47)
31-35
Updated 79d ago
flashcards Flashcards (69)
BIOL 375 Exam 2
Updated 1026d ago
flashcards Flashcards (76)
MB3
Updated 191d ago
flashcards Flashcards (37)
Tema 6: Contexto 2
Updated 970d ago
flashcards Flashcards (30)
Emotions and moods
Updated 187d ago
flashcards Flashcards (114)