CS483 9/22

Sorting Algorithms Overview

Comparison-Based Sorts and Lower Bounds
  • Comparison-based sorting algorithms, regardless of how they partition elements (e.g., into two or three parts), fundamentally rely on comparisons between elements.
  • This characteristic limits their optimal runtime. The theoretical lower bound for comparison-based sorts is O(NlogN)O(N \log N).
Quicksort Analysis: Uneven Partitions
  • Impact of Uneven Splits: When Quicksort doesn't pick a perfect median, it can lead to uneven partitions. For example, splitting a list into 1/41/4 and 3/43/4 of its elements (N/4N/4 and 3N/43N/4).
  • Recursive Calls: Each element is then further divided: a block of N/16N/16 and 3N/163N/16, and another block of 3N/163N/16 and 9N/169N/16.
  • Tree Structure Consequences: This uneven partitioning affects the structure of the recursion tree:
    • Some branches will be shorter (ending sooner) and some will be longer (ending later).
    • The height of the tree will vary across different branches.
    • The shortest path to a base case (e.g., a single element) will be determined by repeatedly dividing by the largest reduction factor (e.g., 44 if one side is 1/41/4). Its height would be log4N\log_4 N.
    • The longest path to a base case will be determined by repeatedly dividing by the smallest reduction factor (e.g., 4/34/3 if one side is 3/43/4). Its height would be log4/3N\log_{4/3} N.
  • Asymptotic Equivalence of Logarithms: Despite different bases, both heights are asymptotically equivalent to Θ(logN)\Theta(\log N). This means that in terms of overall Big-O complexity, the base of the logarithm does not change the asymptotic behavior (log<em>aN=klog</em>bN\log<em>a N = k \log</em>b N for some constant k=logabk = \log_a b).
Randomized vs. Deterministic Algorithms
  • Randomized Algorithms (e.g., Randomized Quicksort):
    • May have a worst-case runtime (e.g., N2N^2) but an excellent expected-case runtime (e.g., NlogNN \log N).
    • The worst-case scenario is typically rare and requires a specific, often pathological, input sequence or consistently 'unlucky' random choices.
    • The randomness usually means that no single input consistently triggers the worst case; the worst case depends on the random choices made by the algorithm, not solely on the input data.
  • Deterministic Algorithms (e.g., Heapsort, Mergesort):
    • Have a consistent runtime performance for a given input size, meaning their best, average, and worst-case runtimes are often the same (e.g., NlogNN \log N for Heapsort and Mergesort).
    • The runtime is entirely predictable for any given input.
    • Lack the unpredictability of randomized choices, which can be an advantage in scenarios where consistent performance is critical.