External Sorting

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/21

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

22 Terms

1
New cards

External Sorting

Sorting algorithms designed for data too large to fit in memory

2
New cards

Why External Sort?

Main memory is too small; must use disk-based sort

3
New cards

External Merge Sort

The most common algorithm for external sorting

4
New cards

Pass 0 (External Merge Sort)

Sort chunks of data (runs) in memory and write to disk

5
New cards

Subsequent Passes (Merge Phase)

Merge sorted runs from previous pass into larger runs

6
New cards

Run

A sorted sequence of tuples written to disk

7
New cards

Buffer Pages

Pages in memory used during sorting; determines # of runs that can be merged

8
New cards

number of runs in pass 0

ceil(N / B), where N = total pages, B = buffer pages

9
New cards

Fan-in (Merge Degree)

Number of runs merged in one pass; up to B-1

10
New cards

number of merge passes

ceil(log_(B-1)(# of initial runs))

11
New cards

Total Passes

1 (Pass 0) + number of merge passes

12
New cards

Total I/O Cost

2 * N * (1 + number of passes) = read + write for each pass

13
New cards

Double Buffering

Technique to overlap I/O and computation; improves sort performance

14
New cards

Replacement Selection

Creates longer initial runs using a priority queue (heap)

15
New cards

Limit of Replacement Selection

Run size ≈ 2 * B pages on average

16
New cards

Optimal Merge Tree

Merges runs in the most efficient order to minimize passes

17
New cards

Tuple Comparison Cost

Negligible compared to I/O in external sort

18
New cards

Sorting vs Hashing

Use sort for ordered output or range queries; use hash for equality joins

19
New cards

Applications of External Sort

ORDER BY, GROUP BY, DISTINCT, merge join

20
New cards

Minimizing I/O in Sort

Maximize memory buffers (B), use double buffering, merge efficiently

21
New cards

When to Use External Sort

Query result or data too large for RAM

22
New cards