1/21
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
External Sorting
Sorting algorithms designed for data too large to fit in memory
Why External Sort?
Main memory is too small; must use disk-based sort
External Merge Sort
The most common algorithm for external sorting
Pass 0 (External Merge Sort)
Sort chunks of data (runs) in memory and write to disk
Subsequent Passes (Merge Phase)
Merge sorted runs from previous pass into larger runs
Run
A sorted sequence of tuples written to disk
Buffer Pages
Pages in memory used during sorting; determines # of runs that can be merged
number of runs in pass 0
ceil(N / B), where N = total pages, B = buffer pages
Fan-in (Merge Degree)
Number of runs merged in one pass; up to B-1
number of merge passes
ceil(log_(B-1)(# of initial runs))
Total Passes
1 (Pass 0) + number of merge passes
Total I/O Cost
2 * N * (1 + number of passes) = read + write for each pass
Double Buffering
Technique to overlap I/O and computation; improves sort performance
Replacement Selection
Creates longer initial runs using a priority queue (heap)
Limit of Replacement Selection
Run size ≈ 2 * B pages on average
Optimal Merge Tree
Merges runs in the most efficient order to minimize passes
Tuple Comparison Cost
Negligible compared to I/O in external sort
Sorting vs Hashing
Use sort for ordered output or range queries; use hash for equality joins
Applications of External Sort
ORDER BY, GROUP BY, DISTINCT, merge join
Minimizing I/O in Sort
Maximize memory buffers (B), use double buffering, merge efficiently
When to Use External Sort
Query result or data too large for RAM