Heap Sort & Priority-Queue-Based Sorting

Algorithm X (Generic Priority-Queue–Based Sorting)

  • Definition: A family of comparison–based sorting algorithms that repeatedly
    1. insert all nn input keys into a priority queue (PQ)
    2. repeatedly remove the extreme key (min or max) and append it to the output.
  • Running-time & space
    • Worst-case depends entirely on the concrete PQ implementation.
    • With a binary heap as the PQ:
      • Each insert\text{insert} costs Θ(logn)\Theta(\log n).
      • Each removeMax\text{removeMax} (or removeMin\text{removeMin}) costs Θ(logn)\Theta(\log n).
      • Total 2n2n operations ⇒ nlogn+nlogn=2nlogn=Θ(nlogn)n\log n + n\log n = 2n\log n = \Theta(n\log n).
      • Space is the heap itself → O(n)O(n).
    • With a Fibonacci heap the theoretical bound drops to O(n+nloglogn)O(n + n\log \log n) for insert + nlognn\log n for delete-max, but constant factors are larger.
  • Key point: The label “Algorithm X” merely refers to PQ-driven sorting; the lecture instantiates it with a heap to obtain Heapsort.

Heap-based Priority Queue ➜ Heap Sort

  • When the PQ is a binary max-heap, Algorithm X specializes to Heapsort.
  • Performance summary (binary heap):
    • Time: O(nlogn)O(n \log n) in all cases (best, average, worst).
    • Extra space: O(1)O(1) beyond the input array when performed in-place.
  • Relative speed (empirical observation):
    • "Much faster than" elementary quadratic sorts (Insertion & Selection) once n30!!50n\gtrsim 30!–!50.

In-place Heapsort – High-level Plan

  • View the input array as a complete binary tree. Indexing starts at 1 for mathematical convenience.
  • Two distinct phases:
    1. Heap Construction (Heapify)
    • Convert the unordered array into a max-heap.
    • Bottom-up method: run sink\text{sink} (a.k.a. down-heap or heapify) on every internal node k=n/2,,1k = \lfloor n/2 \rfloor,\dots,1.
    • Runs in O(n)O(n) time (not O(nlogn)O(n \log n)) because the lower levels are cheap.
    1. Sortdown
    • Repeat while heap size k>1: exchange the root (max) with item kk, shrink the heap boundary, and sink the new root.
    • Each iteration costs Θ(logk)\Theta(\log k), summing to Θ(nlogn)\Theta(n \log n) overall.

Detailed Algorithmic Operations

  • sink(pq, k, n)
    • While the current node kk has at least one child (index 2kn2k \le n):
    1. Let j=2kj = 2k (left child).
    2. If right child exists and \text{pq}[j] < \text{pq}[j+1], set j=j+1j = j+1.
    3. If the parent pq[k]pq[j]\text{pq}[k] \ge \text{pq}[j], stop (heap order restored).
    4. Else exchange pq[k]\text{pq}[k] and pq[j]\text{pq}[j], set k=jk=j, and continue.
  • exch(pq, i, j) – constant-time swap routine.

Worked Example – Sorting the String “SORTEXAMPLE”

  • Initial array (indexed 1..11):
    S  O  R  T  E  X  A  M  P  L  ES\;O\;R\;T\;E\;X\;A\;M\;P\;L\;E
  • Heapify Trace (selected snapshots)
    • After bottom-up sinks: X  T  S  P  L  R  A  M  O  E  EX\;T\;S\;P\;L\;R\;A\;M\;O\;E\;E (valid max-heap).
  • Sortdown Trace (selected snapshots)
    • Exchange root with last element, sink, repeat → final array: A  E  E  L  M  O  P  R  S  T  XA\;E\;E\;L\;M\;O\;P\;R\;S\;T\;X.
  • Visual slides 4–25 & 26–80 show every intermediate heap and array state; the key pedagogical takeaway is heap property before the unsorted prefix and sorted suffix growing right-to-left.

Java-like Reference Implementation (Lines 57-70, 78-89)

public static void sort(Comparable[] pq) {
    int n = pq.length;
    // heapify phase
    for (int k = n/2; k >= 1; k--)      // indices 1..n assumed
        sink(pq, k, n);
    // sortdown phase
    for (int k = n; k > 1; k--) {
        exch(pq, 1, k);
        sink(pq, 1, k-1);
    }
}
private static void sink(Comparable[] pq, int k, int n) {
    while (2*k <= n) {
        int j = 2*k;
        if (j < n && less(pq, j, j+1)) j++;
        if (!less(pq, k, j)) break;
        exch(pq, k, j);
        k = j;
    }
}
  • Uses 1-based indexing; adapt easily to 0-based arrays by arithmetic offsets.

Comparison with Selection & Insertion Sort

  • Decision table (slides 85–88):
    • Selection Sort: O(n2)O(n^2) time, O(1)O(1) space, in-place ✓, not adaptive ✗, not stable ✗.
    • Insertion Sort: O(n2)O(n^2) time, O(1)O(1) space, in-place ✓, adaptive ✓, stable ✓.
    • Heapsort: O(nlogn)O(n \log n) time, O(1)O(1) extra space, in-place ✓, not adaptive ✗, unstable ✗ (equal keys can be permuted).

Algorithm-by-Algorithm “State After k Iterations” (slide 92)

  • Selection Sort: first kk positions are final & minimal so far.
  • Insertion Sort: first kk positions are sorted but may still shift right later.
  • Heap Sort: last kk positions (right end) hold the kk largest keys in final order; prefix 1..nkn-k forms a valid max-heap.
  • Merge Sort (recursive): every subarray produced by recursive calls is already sorted.
  • Merge Sort (bottom-up): all runs of length 2k2^k are sorted after pass kk.
  • Quick Sort (standard or 3-way): pivot element A[j]A[j] is fixed; left side << pivot, right side >> pivot; 3-way adds contiguous block of == pivot.

Practical & Philosophical Notes

  • In-place yet not stable: The constant-space virtue comes at the expense of re-ordering equal keys.
  • Not adaptive: Even presorted input costs Θ(nlogn)\Theta(n \log n) because heapify already runs O(n)O(n) and every subsequent sink Θ(logn)\Theta(\log n).
  • Cache behavior: Poorer locality than quicksort or insertion sort; modern CPUs often find Heapsort slower than well-implemented quicksort on practical inputs.
    However, it offers a worst-case guarantee without extra memory, making it valuable in resource-bounded systems.
  • Security relevance: Because Heapsort’s time bound is deterministic O(nlogn)O(n \log n), it avoids the adversarial O(n2)O(n^2) patterns that can plague naive quicksort, useful in untrusted-input contexts.

Key Equations & Numbers to Memorize

  • Worst-case running time: T(n)=nlogn+O(n)=Θ(nlogn)T(n) = n \log n + O(n) = \Theta(n \log n).
  • Extra memory: S(n)=O(1)S(n) = O(1) (just a few index variables).
  • Bottom-up heap construction cost: h=0lognn2h+1h=O(n)\displaystyle \sum_{h=0}^{\lfloor \log n \rfloor} \frac{n}{2^{h+1}} \cdot h = O(n).
  • Heapsort vs. quadratic sorts: limnnlognn2=0\lim_{n\to\infty} \frac{n \log n}{n^2} = 0 ⇒ asymptotically faster.

Implementation Gotchas & Tips

  • When using 0-based arrays in C/Java/Python, children of index ii are 2i+12i+1 and 2i+22i+2; parent is (i1)/2\lfloor (i-1)/2 \rfloor.
  • Always exchange root with last element in heap, not necessarily with array index nn; the heap size shrinks each iteration.
  • For identical keys, Heapsort’s lack of stability may or may not matter; wrap (key,value) pairs to preserve order if needed.

Summary

  • Heapsort is an in-place, comparison-based sort with deterministic Θ(nlogn)\Theta(n \log n) runtime and O(1)O(1) extra space.
  • Achieves this by: (1) linear-time bottom-up heap construction, (2) nn successive delete-max operations, (3) exploiting the input array both as heap storage and as the final sorted container.
  • While theoretically elegant and worst-case optimal, real-world performance can lag behind quicksort/merge sort because of cache effects and constant factors.