Algorithm Analysis: QuickSelect and Deterministic Selection

Quick Select Average Case Analysis

Recurrence Relation for Average Running Time (Similar to Quick Sort but for QuickSelect):
- The recurrence relation is given by $T(n) = n + \frac{2}{n} \sum_{j=\lfloor n/2 \rfloor}^{n-1} T(j)$ .
- Here, $n$ represents the work for partitioning, and the summation represents the recursive calls on one of the partitions, considering possible pivot positions from $n/2$ to $n-1$ .
Goal: Find an exact expression or a tight bound for $T(n)$ .
Method: Using a guess function and proving it by mathematical induction.
Guess Function for Quick Select: We aim to prove that $T(n) \le 4n$ for some constant $4$ .
- Such a guess function is typical for bounding the growth rate.
Proof by Induction:
- Basis Case ( $n=1$ ):
  - For $n=1$ , $T(1) = 0$ (as the sum range is from $\lfloor 1/2 \rfloor = 0$ to $1-1=0$ , an empty sum, and selecting from a single element array takes no additional recursive work).
  - We need to check if $T(1) \le 4 \cdot 1$ remains true. $0 \le 4$ is true.
- Inductive Hypothesis: Assume that for some integer k > 1, $T(j) \le 4j$ for all integers $j$ such that 1 \le j < k.
- Inductive Step: Prove that $T(k) \le 4k$ .
  - Using the recurrence relation for $T(k)$ : $T(k) = (k-1) + \frac{2}{k} \sum_{j=\lfloor k/2 \rfloor}^{k-1} T(j)$ (The initial $n$ term is for partitioning, assumed to be $k-1$ here).
  - Apply the inductive hypothesis ( $T(j) \le 4j$ ) to the summation:
    ${T(k) \le (k-1) + \frac{2}{k} \sum_{j=\lfloor k/2 \rfloor}^{k-1} (4j)}$
  - Factor out the constant $4$ :
    ${T(k) \le (k-1) + \frac{8}{k} \sum_{j=\lfloor k/2 \rfloor}^{k-1} j}$
  - Calculate the sum of the arithmetic series $\sum_{j=\lfloor k/2 \rfloor}^{k-1} j$ . Assuming $k$ is even for simplicity, the sum range is from $k/2$ to $k-1$ .
    - Number of terms: $(k-1) - (k/2) + 1 = k/2$ .
    - Sum: $\frac{\text{(num terms)} \cdot (\text{first term} + \text{last term})}{2} = \frac{(k/2) \cdot (k/2 + k-1)}{2} = \frac{(k/2) \cdot (3k/2 - 1)}{2} = \frac{k(3k - 2)}{8}$ .
  - Substitute the sum back into the inequality:
    ${T(k) \le (k-1) + \frac{8}{k} \left( \frac{k(3k - 2)}{8} \right)}$
    ${T(k) \le (k-1) + (3k - 2)}$
    ${T(k) \le 4k - 3}$
  - Since 4k - 3 < 4k, the inequality $T(k) \le 4k$ holds true.
Conclusion: By induction, the guess function is correct. The average running time of Quick Select is $\Theta(n)$ .

Deterministic Linear-Time Selection Algorithm (BFPRT / Median-of-Medians)

Problem: Quick Select has an average-case running time of $\Theta(n)$ , but its worst-case running time is $\Theta(n^2)$ .
Question: Can we have a non-random (deterministic) algorithm that guarantees $\Theta(n)$ worst-case running time for finding the $i$ -th smallest element?
Answer: Yes, the BFPRT algorithm (Blum, Floyd, Pratt, Rivest, Tarjan, 1972) achieves this.
Core Idea: The main challenge in QuickSelect's worst-case performance is a bad pivot choice. If the pivot consistently produces unbalanced partitions, the depth of recursion increases to $n^2$ . The BFPRT algorithm's innovation is to deterministically find a good pivot that guarantees a balanced partition, thus ensuring linear time in the worst case.
How to Guarantee a Balanced Partition: A pivot that is close to the true median of the array (e.g., within 30-70% splits) would ensure balanced partitions.
The Trick (Finding an Approximate Median): Directly computing the median of all elements would be solving the same problem. So, BFPRT finds a

Quick Select Average Case Analysis

Recurrence Relation for Average Running Time (Similar to Quick Sort but for QuickSelect):- The recurrence relation is given by $T(n) = n + \frac{2}{n} \sum_{\lfloor n/2 \rfloor}^{n-1} T(j)$ .
- Here, $n$ represents the work for partitioning, and the summation represents the recursive calls on one of the partitions, considering possible pivot positions from $n/2$ to $n-1$ .
Goal: Find an exact expression or a tight bound for $T(n)$ .
Method: Using a guess function and proving it by mathematical induction.
Guess Function for Quick Select: We aim to prove that $T(n) \le 4n$ for some constant $4$ .- Such a guess function is typical for bounding the growth rate.
Proof by Induction:- Basis Case ( $n=1$ ):- For $n=1$ , $T(1) = 0$ (as the sum range is from $\lfloor 1/2 \rfloor = 0$ to $1-1=0$ , an empty sum, and selecting from a single element array takes no additional recursive work).
- We need to check if $T(1) \le 4 \cdot 1$ remains true. $0 \le 4$ is true.
- Inductive Hypothesis: Assume that for some integer k > 1, $T(j) \le 4j$ for all integers $j$ such that 1 \le j < k.
- Inductive Step: Prove that $T(k) \le 4k$ .- Using the recurrence relation for $T(k)$ : $T(k) = (k-1) + \frac{2}{k} \sum_{\lfloor k/2 \rfloor}^{k-1} T(j)$ (The initial $n$ term is for partitioning, assumed to be $k-1$ here).
  - Apply the inductive hypothesis ( $T(j) \le 4j$ ) to the summation:
    ${T(k) \le (k-1) + \frac{2}{k} \sum_{\lfloor k/2 \rfloor}^{k-1} (4j)}$
  - Factor out the constant $4$ :
    ${T(k) \le (k-1) + \frac{8}{k} \sum_{\lfloor k/2 \rfloor}^{k-1} j}$
  - Calculate the sum of the arithmetic series $\sum_{\lfloor k/2 \rfloor}^{k-1} j$ . Assuming $k$ is even for simplicity, the sum range is from $k/2$ to $k-1$ .- Number of terms: $(k-1) - (k/2) + 1 = k/2$ .
    - Sum: $\frac{\text{(num terms)} \cdot (\text{first term} + \text{last term})}{2} = \frac{(k/2) \cdot (k/2 + k-1)}{2} = \frac{(k/2) \cdot (3k/2 - 1)}{2} = \frac{k(3k - 2)}{8}$ .
  - Substitute the sum back into the inequality:
    ${T(k) \le (k-1) + \frac{8}{k} \left( \frac{k(3k - 2)}{8} \right)}$
    ${T(k) \le (k-1) + (3k - 2)}$
    ${T(k) \le 4k - 3}$
  - Since 4k - 3 < 4k, the inequality $T(k) \le 4k$ holds true.
Conclusion: By induction, the guess function is correct. The average running time of Quick Select is $\Theta(n)$ .

Deterministic Linear-Time Selection Algorithm (BFPRT / Median-of-Medians)

Problem: Quick Select has an average-case running time of $\Theta(n)$ , but its worst-case running time is $\Theta(n^2)$ .
Question: Can we have a non-random (deterministic) algorithm that guarantees $\Theta(n)$ worst-case running time for finding the $i$ -th smallest element?
Answer: Yes, the BFPRT algorithm (Blum, Floyd, Pratt, Rivest, Tarjan, 1972) achieves this.
Core Idea: The main challenge in QuickSelect's worst-case performance is a bad pivot choice. If the pivot consistently produces unbalanced partitions, the depth of recursion increases to $n^2$ . The BFPRT algorithm's innovation is to deterministically find a good pivot that guarantees a balanced partition, thus ensuring linear time in the worst case.
How to Guarantee a Balanced Partition: A pivot that is close to the true median of the array (e.g., within 30-70% splits) would ensure balanced partitions.
The Trick (Finding an Approximate Median): Directly computing the median of all elements would be solving the same problem. So, BFPRT finds a median of medians using a recursive approach:
1. Divide into Groups: Divide the $n$ elements into groups of $5$ . (If $n$ is not a multiple of $5$ , the last group will have fewer than $5$ elements).
2. Find Medians of Groups: Find the median of each of these $\lceil n/5 \rceil$ groups. Sorting $5$ elements takes a constant amount of time.
3. Recursive Median-of-Medians: Recursively call the BFPRT algorithm on the $\lceil n/5 \rceil$ medians found in the previous step to find their median, $m<em>m$ . This $m</em>m$ is chosen as the pivot.
4. Partition: Partition the original array around this pivot $m_m$ .
5. Recurse: Recursively call the algorithm on the appropriate partition (the one containing the $i$ -th smallest element). If the $i$ -th element is the pivot, return it.
Why this pivot is good: This "median of medians" pivot ( $m_m$ ) guarantees that at least $3/10$ of the elements are on one side of the pivot and at most $7/10$ on the other side. This ensures a worst-case split of no worse than $30-70 \%$ , preventing the $O(n^2)$ worst-case behavior of standard QuickSelect and guaranteeing $O(n)$ running time.