Algorithm Analysis: QuickSelect and Deterministic Selection
Quick Select Average Case Analysis
Recurrence Relation for Average Running Time (Similar to Quick Sort but for
QuickSelect):The recurrence relation is given by .
Here, represents the work for partitioning, and the summation represents the recursive calls on one of the partitions, considering possible pivot positions from to .
Goal: Find an exact expression or a tight bound for .
Method: Using a guess function and proving it by mathematical induction.
Guess Function for Quick Select: We aim to prove that for some constant .
Such a guess function is typical for bounding the growth rate.
Proof by Induction:
Basis Case ( ):
For , (as the sum range is from to , an empty sum, and selecting from a single element array takes no additional recursive work).
We need to check if remains true. is true.
Inductive Hypothesis: Assume that for some integer k > 1, for all integers such that 1 \le j < k.
Inductive Step: Prove that .
Using the recurrence relation for : (The initial term is for partitioning, assumed to be here).
Apply the inductive hypothesis () to the summation:
Factor out the constant :
Calculate the sum of the arithmetic series . Assuming is even for simplicity, the sum range is from to .
Number of terms: .
Sum: .
Substitute the sum back into the inequality:
Since 4k - 3 < 4k, the inequality holds true.
Conclusion: By induction, the guess function is correct. The average running time of Quick Select is .
Deterministic Linear-Time Selection Algorithm (BFPRT / Median-of-Medians)
Problem: Quick Select has an average-case running time of , but its worst-case running time is .
Question: Can we have a non-random (deterministic) algorithm that guarantees worst-case running time for finding the -th smallest element?
Answer: Yes, the BFPRT algorithm (Blum, Floyd, Pratt, Rivest, Tarjan, 1972) achieves this.
Core Idea: The main challenge in QuickSelect's worst-case performance is a bad pivot choice. If the pivot consistently produces unbalanced partitions, the depth of recursion increases to . The BFPRT algorithm's innovation is to deterministically find a good pivot that guarantees a balanced partition, thus ensuring linear time in the worst case.
How to Guarantee a Balanced Partition: A pivot that is close to the true median of the array (e.g., within 30-70% splits) would ensure balanced partitions.
The Trick (Finding an Approximate Median): Directly computing the median of all elements would be solving the same problem. So, BFPRT finds a
Quick Select Average Case Analysis
Recurrence Relation for Average Running Time (Similar to Quick Sort but for
QuickSelect):- The recurrence relation is given by .Here, represents the work for partitioning, and the summation represents the recursive calls on one of the partitions, considering possible pivot positions from to .
Goal: Find an exact expression or a tight bound for .
Method: Using a guess function and proving it by mathematical induction.
Guess Function for Quick Select: We aim to prove that for some constant .- Such a guess function is typical for bounding the growth rate.
Proof by Induction:- Basis Case ( ):- For , (as the sum range is from to , an empty sum, and selecting from a single element array takes no additional recursive work).
- We need to check if remains true. is true.Inductive Hypothesis: Assume that for some integer k > 1, for all integers such that 1 \le j < k.
Inductive Step: Prove that .- Using the recurrence relation for : (The initial term is for partitioning, assumed to be here).
Apply the inductive hypothesis () to the summation:
Factor out the constant :
Calculate the sum of the arithmetic series . Assuming is even for simplicity, the sum range is from to .- Number of terms: .
Sum: .
Substitute the sum back into the inequality:
Since 4k - 3 < 4k, the inequality holds true.
Conclusion: By induction, the guess function is correct. The average running time of Quick Select is .
Deterministic Linear-Time Selection Algorithm (BFPRT / Median-of-Medians)
Problem: Quick Select has an average-case running time of , but its worst-case running time is .
Question: Can we have a non-random (deterministic) algorithm that guarantees worst-case running time for finding the -th smallest element?
Answer: Yes, the BFPRT algorithm (Blum, Floyd, Pratt, Rivest, Tarjan, 1972) achieves this.
Core Idea: The main challenge in QuickSelect's worst-case performance is a bad pivot choice. If the pivot consistently produces unbalanced partitions, the depth of recursion increases to . The BFPRT algorithm's innovation is to deterministically find a good pivot that guarantees a balanced partition, thus ensuring linear time in the worst case.
How to Guarantee a Balanced Partition: A pivot that is close to the true median of the array (e.g., within 30-70% splits) would ensure balanced partitions.
The Trick (Finding an Approximate Median): Directly computing the median of all elements would be solving the same problem. So, BFPRT finds a median of medians using a recursive approach:
Divide into Groups: Divide the elements into groups of . (If is not a multiple of , the last group will have fewer than elements).
Find Medians of Groups: Find the median of each of these groups. Sorting elements takes a constant amount of time.
Recursive Median-of-Medians: Recursively call the BFPRT algorithm on the medians found in the previous step to find their median, . This is chosen as the pivot.
Partition: Partition the original array around this pivot .
Recurse: Recursively call the algorithm on the appropriate partition (the one containing the -th smallest element). If the -th element is the pivot, return it.
Why this pivot is good: This "median of medians" pivot () guarantees that at least of the elements are on one side of the pivot and at most on the other side. This ensures a worst-case split of no worse than , preventing the worst-case behavior of standard QuickSelect and guaranteeing running time.