Algorithm Analysis: QuickSelect and Deterministic Selection
Quick Select Average Case Analysis
Recurrence Relation for Average Running Time (Similar to Quick Sort but for
QuickSelect):The recurrence relation is given by T(n) = n + \frac{2}{n} \sum_{j=\lfloor n/2 \rfloor}^{n-1} T(j).
Here, n represents the work for partitioning, and the summation represents the recursive calls on one of the partitions, considering possible pivot positions from n/2 to n-1.
Goal: Find an exact expression or a tight bound for T(n).
Method: Using a guess function and proving it by mathematical induction.
Guess Function for Quick Select: We aim to prove that T(n) \le 4n for some constant 4.
Such a guess function is typical for bounding the growth rate.
Proof by Induction:
Basis Case ( n=1 ):
For n=1, T(1) = 0 (as the sum range is from \lfloor 1/2 \rfloor = 0 to 1-1=0, an empty sum, and selecting from a single element array takes no additional recursive work).
We need to check if T(1) \le 4 \cdot 1 remains true. 0 \le 4 is true.
Inductive Hypothesis: Assume that for some integer k > 1, T(j) \le 4j for all integers j such that 1 \le j < k.
Inductive Step: Prove that T(k) \le 4k.
Using the recurrence relation for T(k): T(k) = (k-1) + \frac{2}{k} \sum_{j=\lfloor k/2 \rfloor}^{k-1} T(j) (The initial n term is for partitioning, assumed to be k-1 here).
Apply the inductive hypothesis (T(j) \le 4j) to the summation:
{T(k) \le (k-1) + \frac{2}{k} \sum_{j=\lfloor k/2 \rfloor}^{k-1} (4j)}Factor out the constant 4:
{T(k) \le (k-1) + \frac{8}{k} \sum_{j=\lfloor k/2 \rfloor}^{k-1} j}Calculate the sum of the arithmetic series \sum_{j=\lfloor k/2 \rfloor}^{k-1} j. Assuming k is even for simplicity, the sum range is from k/2 to k-1.
Number of terms: (k-1) - (k/2) + 1 = k/2.
Sum: \frac{\text{(num terms)} \cdot (\text{first term} + \text{last term})}{2} = \frac{(k/2) \cdot (k/2 + k-1)}{2} = \frac{(k/2) \cdot (3k/2 - 1)}{2} = \frac{k(3k - 2)}{8}.
Substitute the sum back into the inequality:
{T(k) \le (k-1) + \frac{8}{k} \left( \frac{k(3k - 2)}{8} \right)}
{T(k) \le (k-1) + (3k - 2)}
{T(k) \le 4k - 3}Since 4k - 3 < 4k, the inequality T(k) \le 4k holds true.
Conclusion: By induction, the guess function is correct. The average running time of Quick Select is \Theta(n).
Deterministic Linear-Time Selection Algorithm (BFPRT / Median-of-Medians)
Problem: Quick Select has an average-case running time of \Theta(n), but its worst-case running time is \Theta(n^2).
Question: Can we have a non-random (deterministic) algorithm that guarantees \Theta(n) worst-case running time for finding the i-th smallest element?
Answer: Yes, the BFPRT algorithm (Blum, Floyd, Pratt, Rivest, Tarjan, 1972) achieves this.
Core Idea: The main challenge in QuickSelect's worst-case performance is a bad pivot choice. If the pivot consistently produces unbalanced partitions, the depth of recursion increases to n^2. The BFPRT algorithm's innovation is to deterministically find a good pivot that guarantees a balanced partition, thus ensuring linear time in the worst case.
How to Guarantee a Balanced Partition: A pivot that is close to the true median of the array (e.g., within 30-70% splits) would ensure balanced partitions.
The Trick (Finding an Approximate Median): Directly computing the median of all elements would be solving the same problem. So, BFPRT finds a
Quick Select Average Case Analysis
Recurrence Relation for Average Running Time (Similar to Quick Sort but for
QuickSelect):- The recurrence relation is given by T(n) = n + \frac{2}{n} \sum_{\lfloor n/2 \rfloor}^{n-1} T(j).Here, n represents the work for partitioning, and the summation represents the recursive calls on one of the partitions, considering possible pivot positions from n/2 to n-1.
Goal: Find an exact expression or a tight bound for T(n).
Method: Using a guess function and proving it by mathematical induction.
Guess Function for Quick Select: We aim to prove that T(n) \le 4n for some constant 4.- Such a guess function is typical for bounding the growth rate.
Proof by Induction:- Basis Case ( n=1 ):- For n=1, T(1) = 0 (as the sum range is from \lfloor 1/2 \rfloor = 0 to 1-1=0, an empty sum, and selecting from a single element array takes no additional recursive work).
- We need to check if T(1) \le 4 \cdot 1 remains true. 0 \le 4 is true.Inductive Hypothesis: Assume that for some integer k > 1, T(j) \le 4j for all integers j such that 1 \le j < k.
Inductive Step: Prove that T(k) \le 4k.- Using the recurrence relation for T(k): T(k) = (k-1) + \frac{2}{k} \sum_{\lfloor k/2 \rfloor}^{k-1} T(j) (The initial n term is for partitioning, assumed to be k-1 here).
Apply the inductive hypothesis (T(j) \le 4j) to the summation:
{T(k) \le (k-1) + \frac{2}{k} \sum_{\lfloor k/2 \rfloor}^{k-1} (4j)}
Factor out the constant 4:
{T(k) \le (k-1) + \frac{8}{k} \sum_{\lfloor k/2 \rfloor}^{k-1} j}
Calculate the sum of the arithmetic series \sum_{\lfloor k/2 \rfloor}^{k-1} j. Assuming k is even for simplicity, the sum range is from k/2 to k-1.- Number of terms: (k-1) - (k/2) + 1 = k/2.
Sum: \frac{\text{(num terms)} \cdot (\text{first term} + \text{last term})}{2} = \frac{(k/2) \cdot (k/2 + k-1)}{2} = \frac{(k/2) \cdot (3k/2 - 1)}{2} = \frac{k(3k - 2)}{8}.
Substitute the sum back into the inequality:
{T(k) \le (k-1) + \frac{8}{k} \left( \frac{k(3k - 2)}{8} \right)}
{T(k) \le (k-1) + (3k - 2)}
{T(k) \le 4k - 3}
Since 4k - 3 < 4k, the inequality T(k) \le 4k holds true.
Conclusion: By induction, the guess function is correct. The average running time of Quick Select is \Theta(n).
Deterministic Linear-Time Selection Algorithm (BFPRT / Median-of-Medians)
Problem: Quick Select has an average-case running time of \Theta(n), but its worst-case running time is \Theta(n^2).
Question: Can we have a non-random (deterministic) algorithm that guarantees \Theta(n) worst-case running time for finding the i-th smallest element?
Answer: Yes, the BFPRT algorithm (Blum, Floyd, Pratt, Rivest, Tarjan, 1972) achieves this.
Core Idea: The main challenge in QuickSelect's worst-case performance is a bad pivot choice. If the pivot consistently produces unbalanced partitions, the depth of recursion increases to n^2. The BFPRT algorithm's innovation is to deterministically find a good pivot that guarantees a balanced partition, thus ensuring linear time in the worst case.
How to Guarantee a Balanced Partition: A pivot that is close to the true median of the array (e.g., within 30-70% splits) would ensure balanced partitions.
The Trick (Finding an Approximate Median): Directly computing the median of all elements would be solving the same problem. So, BFPRT finds a median of medians using a recursive approach:
Divide into Groups: Divide the n elements into groups of 5. (If n is not a multiple of 5, the last group will have fewer than 5 elements).
Find Medians of Groups: Find the median of each of these \lceil n/5 \rceil groups. Sorting 5 elements takes a constant amount of time.
Recursive Median-of-Medians: Recursively call the BFPRT algorithm on the \lceil n/5 \rceil medians found in the previous step to find their median, mm. This mm is chosen as the pivot.
Partition: Partition the original array around this pivot m_m.
Recurse: Recursively call the algorithm on the appropriate partition (the one containing the i-th smallest element). If the i-th element is the pivot, return it.
Why this pivot is good: This "median of medians" pivot (m_m) guarantees that at least 3/10 of the elements are on one side of the pivot and at most 7/10 on the other side. This ensures a worst-case split of no worse than 30-70 \%, preventing the O(n^2) worst-case behavior of standard QuickSelect and guaranteeing O(n) running time.