Algorithm Analysis: QuickSelect and Deterministic Selection

Quick Select Average Case Analysis

  • Recurrence Relation for Average Running Time (Similar to Quick Sort but for QuickSelect):

    • The recurrence relation is given by T(n)=n+2nj=n/2n1T(j)T(n) = n + \frac{2}{n} \sum_{j=\lfloor n/2 \rfloor}^{n-1} T(j).

    • Here, nn represents the work for partitioning, and the summation represents the recursive calls on one of the partitions, considering possible pivot positions from n/2n/2 to n1n-1.

  • Goal: Find an exact expression or a tight bound for T(n)T(n).

  • Method: Using a guess function and proving it by mathematical induction.

  • Guess Function for Quick Select: We aim to prove that T(n)4nT(n) \le 4n for some constant 44.

    • Such a guess function is typical for bounding the growth rate.

  • Proof by Induction:

    • Basis Case ( n=1n=1 ):

      • For n=1n=1, T(1)=0T(1) = 0 (as the sum range is from 1/2=0\lfloor 1/2 \rfloor = 0 to 11=01-1=0, an empty sum, and selecting from a single element array takes no additional recursive work).

      • We need to check if T(1)41T(1) \le 4 \cdot 1 remains true. 040 \le 4 is true.

    • Inductive Hypothesis: Assume that for some integer k > 1, T(j)4jT(j) \le 4j for all integers jj such that 1 \le j < k.

    • Inductive Step: Prove that T(k)4kT(k) \le 4k.

      • Using the recurrence relation for T(k)T(k): T(k)=(k1)+2kj=k/2k1T(j)T(k) = (k-1) + \frac{2}{k} \sum_{j=\lfloor k/2 \rfloor}^{k-1} T(j) (The initial nn term is for partitioning, assumed to be k1k-1 here).

      • Apply the inductive hypothesis (T(j)4jT(j) \le 4j) to the summation:
        T(k)(k1)+2kj=k/2k1(4j){T(k) \le (k-1) + \frac{2}{k} \sum_{j=\lfloor k/2 \rfloor}^{k-1} (4j)}

      • Factor out the constant 44:
        T(k)(k1)+8kj=k/2k1j{T(k) \le (k-1) + \frac{8}{k} \sum_{j=\lfloor k/2 \rfloor}^{k-1} j}

      • Calculate the sum of the arithmetic series j=k/2k1j\sum_{j=\lfloor k/2 \rfloor}^{k-1} j. Assuming kk is even for simplicity, the sum range is from k/2k/2 to k1k-1.

        • Number of terms: (k1)(k/2)+1=k/2(k-1) - (k/2) + 1 = k/2.

        • Sum: (num terms)(first term+last term)2=(k/2)(k/2+k1)2=(k/2)(3k/21)2=k(3k2)8\frac{\text{(num terms)} \cdot (\text{first term} + \text{last term})}{2} = \frac{(k/2) \cdot (k/2 + k-1)}{2} = \frac{(k/2) \cdot (3k/2 - 1)}{2} = \frac{k(3k - 2)}{8}.

      • Substitute the sum back into the inequality:
        T(k)(k1)+8k(k(3k2)8){T(k) \le (k-1) + \frac{8}{k} \left( \frac{k(3k - 2)}{8} \right)}
        T(k)(k1)+(3k2){T(k) \le (k-1) + (3k - 2)}
        T(k)4k3{T(k) \le 4k - 3}

      • Since 4k - 3 < 4k, the inequality T(k)4kT(k) \le 4k holds true.

  • Conclusion: By induction, the guess function is correct. The average running time of Quick Select is Θ(n)\Theta(n).

Deterministic Linear-Time Selection Algorithm (BFPRT / Median-of-Medians)

  • Problem: Quick Select has an average-case running time of Θ(n)\Theta(n), but its worst-case running time is Θ(n2)\Theta(n^2).

  • Question: Can we have a non-random (deterministic) algorithm that guarantees Θ(n)\Theta(n)worst-case running time for finding the ii-th smallest element?

  • Answer: Yes, the BFPRT algorithm (Blum, Floyd, Pratt, Rivest, Tarjan, 1972) achieves this.

  • Core Idea: The main challenge in QuickSelect's worst-case performance is a bad pivot choice. If the pivot consistently produces unbalanced partitions, the depth of recursion increases to n2n^2. The BFPRT algorithm's innovation is to deterministically find a good pivot that guarantees a balanced partition, thus ensuring linear time in the worst case.

  • How to Guarantee a Balanced Partition: A pivot that is close to the true median of the array (e.g., within 30-70% splits) would ensure balanced partitions.

  • The Trick (Finding an Approximate Median): Directly computing the median of all elements would be solving the same problem. So, BFPRT finds a

Quick Select Average Case Analysis

  • Recurrence Relation for Average Running Time (Similar to Quick Sort but for QuickSelect):- The recurrence relation is given by T(n)=n+2nn/2n1T(j)T(n) = n + \frac{2}{n} \sum_{\lfloor n/2 \rfloor}^{n-1} T(j).

    • Here, nn represents the work for partitioning, and the summation represents the recursive calls on one of the partitions, considering possible pivot positions from n/2n/2 to n1n-1.

  • Goal: Find an exact expression or a tight bound for T(n)T(n).

  • Method: Using a guess function and proving it by mathematical induction.

  • Guess Function for Quick Select: We aim to prove that T(n)4nT(n) \le 4n for some constant 44.- Such a guess function is typical for bounding the growth rate.

  • Proof by Induction:- Basis Case ( n=1n=1 ):- For n=1n=1, T(1)=0T(1) = 0 (as the sum range is from 1/2=0\lfloor 1/2 \rfloor = 0 to 11=01-1=0, an empty sum, and selecting from a single element array takes no additional recursive work).
    - We need to check if T(1)41T(1) \le 4 \cdot 1 remains true. 040 \le 4 is true.

    • Inductive Hypothesis: Assume that for some integer k > 1, T(j)4jT(j) \le 4j for all integers jj such that 1 \le j < k.

    • Inductive Step: Prove that T(k)4kT(k) \le 4k.- Using the recurrence relation for T(k)T(k): T(k)=(k1)+2kk/2k1T(j)T(k) = (k-1) + \frac{2}{k} \sum_{\lfloor k/2 \rfloor}^{k-1} T(j) (The initial nn term is for partitioning, assumed to be k1k-1 here).

      • Apply the inductive hypothesis (T(j)4jT(j) \le 4j) to the summation:

        T(k)(k1)+2kk/2k1(4j){T(k) \le (k-1) + \frac{2}{k} \sum_{\lfloor k/2 \rfloor}^{k-1} (4j)}

      • Factor out the constant 44:

        T(k)(k1)+8kk/2k1j{T(k) \le (k-1) + \frac{8}{k} \sum_{\lfloor k/2 \rfloor}^{k-1} j}

      • Calculate the sum of the arithmetic series k/2k1j\sum_{\lfloor k/2 \rfloor}^{k-1} j. Assuming kk is even for simplicity, the sum range is from k/2k/2 to k1k-1.- Number of terms: (k1)(k/2)+1=k/2(k-1) - (k/2) + 1 = k/2.

        • Sum: (num terms)(first term+last term)2=(k/2)(k/2+k1)2=(k/2)(3k/21)2=k(3k2)8\frac{\text{(num terms)} \cdot (\text{first term} + \text{last term})}{2} = \frac{(k/2) \cdot (k/2 + k-1)}{2} = \frac{(k/2) \cdot (3k/2 - 1)}{2} = \frac{k(3k - 2)}{8}.

      • Substitute the sum back into the inequality:

        T(k)(k1)+8k(k(3k2)8){T(k) \le (k-1) + \frac{8}{k} \left( \frac{k(3k - 2)}{8} \right)}

        T(k)(k1)+(3k2){T(k) \le (k-1) + (3k - 2)}

        T(k)4k3{T(k) \le 4k - 3}

      • Since 4k - 3 < 4k, the inequality T(k)4kT(k) \le 4k holds true.

  • Conclusion: By induction, the guess function is correct. The average running time of Quick Select is Θ(n)\Theta(n).

Deterministic Linear-Time Selection Algorithm (BFPRT / Median-of-Medians)

  • Problem: Quick Select has an average-case running time of Θ(n)\Theta(n), but its worst-case running time is Θ(n2)\Theta(n^2).

  • Question: Can we have a non-random (deterministic) algorithm that guarantees Θ(n)\Theta(n)worst-case running time for finding the ii-th smallest element?

  • Answer: Yes, the BFPRT algorithm (Blum, Floyd, Pratt, Rivest, Tarjan, 1972) achieves this.

  • Core Idea: The main challenge in QuickSelect's worst-case performance is a bad pivot choice. If the pivot consistently produces unbalanced partitions, the depth of recursion increases to n2n^2. The BFPRT algorithm's innovation is to deterministically find a good pivot that guarantees a balanced partition, thus ensuring linear time in the worst case.

  • How to Guarantee a Balanced Partition: A pivot that is close to the true median of the array (e.g., within 30-70% splits) would ensure balanced partitions.

  • The Trick (Finding an Approximate Median): Directly computing the median of all elements would be solving the same problem. So, BFPRT finds a median of medians using a recursive approach:

    1. Divide into Groups: Divide the nn elements into groups of 55. (If nn is not a multiple of 55, the last group will have fewer than 55 elements).

    2. Find Medians of Groups: Find the median of each of these n/5\lceil n/5 \rceil groups. Sorting 55 elements takes a constant amount of time.

    3. Recursive Median-of-Medians: Recursively call the BFPRT algorithm on the n/5\lceil n/5 \rceil medians found in the previous step to find their median, m<em>mm<em>m. This m</em>mm</em>m is chosen as the pivot.

    4. Partition: Partition the original array around this pivot mmm_m.

    5. Recurse: Recursively call the algorithm on the appropriate partition (the one containing the ii-th smallest element). If the ii-th element is the pivot, return it.

  • Why this pivot is good: This "median of medians" pivot (mmm_m) guarantees that at least 3/103/10 of the elements are on one side of the pivot and at most 7/107/10 on the other side. This ensures a worst-case split of no worse than 3070%30-70 \%, preventing the O(n2)O(n^2) worst-case behavior of standard QuickSelect and guaranteeing O(n)O(n) running time.