Notes: Expectation, Linear Search, Insertion Sort, Shuffle, and QuickSort (Probabilistic Analysis)

Expectation, Distributions, and Algorithm Analysis

  • The transcript emphasizes building intuition for expectation (E[X]) and how it depends on the underlying probability space.
  • Core definition:
    • Suppose there is a probability distribution over the events in a sample space S, and a random variable x that assigns a value to each event e ∈ S. Then the expectation (mean) is
      \mathbb{E}[X] \equiv \, \sum_{e \in S} P(e) \cdot x(e)
    • The value of the expectation depends on the dimension of the variable; here we focus on a single variable (dimension 1). If the variable were a vector, the mean would be taken per dimension.
  • Important modeling note:
    • There is always an assumed probability distribution over the events in S: each event has probability in [0,1] and the probabilities sum to 1.
    • If you change the distribution, the computed expectation changes. The choice of distribution should reflect the application.
  • A single variable can be modeled by a function-like notation x, though it is not literally a function; X is determined by which event occurs and the value x takes for that event.
  • If you have multiple variables, you typically model their joint behavior with a probability space; the “mean value” concept extends to each dimension, but the simplifications in the transcript assume a single scalar variable.
  • The transcript cautions that average analysis (e.g., average running time) depends on the assumed probability distribution; uniform distribution is a common teaching example, but may not reflect real usage.
  • When the event space is partitioned into disjoint sets A1, A2, …, Am, with probabilities P(Ai) and conditional expectations E[X | Ai], then the overall expectation can be computed by
    \mathbb{E}[X] = \sum{i=1}^m \mathbb{E}[X \mid Ai] \cdot P(A_i).
  • The materials illustrate that average-case analyses can be performed in multiple ways and may yield different constants but the same asymptotic growth, depending on the distribution assumptions.
  • Big picture takeaway: use probability models that reflect the problem, then apply linearity of expectation (where applicable) to compute averages; independence is not always required for summing expectations.

Linear Search: Best, Worst, and Average Running Time

  • Problem setup (linear search): Given an array A of finite size n and a target s, return an index where s occurs or a sentinel value if not present.
    • Return value conventions in the transcript: an index for found, and a negative value (e.g., -1) if not in the array.
  • Best-case running time: the first element is s.
    • Time complexity: \Theta(1)
  • Worst-case running time: s is at the last position or not present at all.
    • Time complexity: \Theta(n)
  • Average-case running time (intuition under a simple model): if s is guaranteed to be in the array and is equally likely to be at any position 1 through n, the number of comparisons is uniformly distributed over {1, 2, …, n}.
    • Expected number of comparisons:
      \mathbb{E}[\text{comparisons}] = \sum{i=1}^n i \cdot \Pr[\text{position}=i] = \frac{1}{n} \sum{i=1}^n i = \frac{n+1}{2}
    • Therefore, average time is \Theta(n).
  • Note on not-possible-to-know-without-distribution caveat:
    • The simplistic half-the-elements intuition (average ≈ n/2) hinges on a uniform distribution of the target’s position. If the distribution is different, the average changes.
    • If you also include the possibility that s is not in A, you must add the probability of the “not found” case and adjust the expectation accordingly.
  • Key learning: average running time is not universally determined by best/worst cases alone; you must specify the underlying probability distribution of where the target lies.

Card Shuffling (Derangements) and the Fix Points Expectation

  • Modeling a random shuffle as a random permutation of n cards.
  • Define indicator variables for fixed points:
    • Let X_i = 1 if card i ends in its original position after the shuffle, and 0 otherwise, for i = 1, …, n.
  • The total number of cards staying in their original positions is
    X = \sum{i=1}^n Xi.
  • Uniform random permutation implies symmetry:
    • For each i, the probability that card i stays in its original position is
      \Pr(X_i = 1) = \frac{1}{n}.
    • Hence, \mathbb{E}[X_i] = \frac{1}{n}.
  • By linearity of expectation (note: independence is not required for linearity of expectation), the expected number of fixed points is
    \mathbb{E}[X] = \sum{i=1}^n \mathbb{E}[Xi] = n \cdot \frac{1}{n} = 1.
  • Important correction to a common pitfall in the transcript:
    • The calculation above does not require X_i to be independent. Lineararity of expectation holds regardless of independence.
  • Real-world note: the distribution of fixed points in a random permutation approaches a Poisson(1) distribution in the limit as n grows large.
  • This example illustrates how to set up a probabilistic model (indicator random variables) and use linearity to obtain the expectation of a sum, even when dependencies exist.

Insertion Sort: Inversions, Best/Worst, and Average Case

  • Key idea: the number of swaps/shifts performed by insertion sort is determined by the number of inversions in the input array.
  • Definitions:
    • An inversion is a pair (i, j) with i < j and A[i] > A[j].
    • The total number of inversions I in an array of size n is at most C(n, 2) = n(n-1)/2 (worst case when array is in reverse order).
  • Worst-case runtime (inversions drive work): if every pair forms an inversion (reverse-sorted input), the algorithm performs O(n^2) work.
    • Specifically, worst-case time is Θ(n^2).
  • Average-case runtime (random permutation model): for a random permutation of n distinct elements, any given pair (i, j) is inverted with probability 1/2.
    • Number of possible pairs: C(n, 2) = n(n-1)/2.
    • Expected number of inversions:
      \mathbb{E}[I] = \sum_{1 \le i < j \le n} \Pr(A[i] > A[j]) = \binom{n}{2} \cdot \tfrac{1}{2} = \frac{n(n-1)}{4}.
  • Therefore, the average running time of insertion sort is Θ(n^2) with a leading constant of 1/4 for the number of inversions driving the work.
  • Alternative presentation (counting argument): the total number of possible inversions is C(n, 2); if each is inverted with probability 1/2, the expected inversions is C(n, 2)/2 = n(n-1)/4.
  • Takeaway: there are multiple ways to compute the same average, and they yield the same asymptotic growth (Θ(n^2)) but may give different constants depending on modeling choices.

QuickSort: Partitioning, Pivot, and Average Number of Comparisons

  • Core operations: QuickSort partitions an array around a pivot, recursively sorting the left and right subarrays.
  • Notation and ideas used in the transcript:
    • Let Z_{i j} denote the subarray from index i to j (inclusive).
    • Define X_{i j} as an indicator variable for whether elements at positions i and j are compared during the partitioning process.
  • Relationship to total work:
    • The total number of comparisons in QuickSort is
      Cn = \sum{1 \le i < j \le n} X_{i j}.
    • If we know the distribution of X{i j}, we can compute the expectation of Cn.
  • Key probability for pairs being compared under a commonly used random-pivot assumption:
    • For i, j with i < j, the probability that elements i and j are compared is
      \Pr(X_{i j} = 1) = \frac{2}{j - i + 1}.
    • This arises from considering the subarray Z_{i j} and the random choice of the first pivot among the j - i + 1 elements in that subarray. The two extreme cases (pivot at i or pivot at j) lead to those two comparisons, yielding the 2/(j-i+1) probability after accounting for symmetry and recursion.
  • Expected number of comparisons (average-case analysis):
    \mathbb{E}[Cn] = \sum{1 \le i < j \le n} \Pr(X{i j}=1) = \sum{1 \le i < j \le n} \frac{2}{j-i+1}.
  • Evaluating the sum (outline):
    • Let k = j - i + 1, so k ranges from 2 to n for each i, and the number of pairs with a given k is n - k + 1.
    • Then
      \mathbb{E}[Cn] = 2 \sum{k=2}^n \frac{n - k + 1}{k} = 2n Hn - 2(n-1) = \Theta(n \log n), where Hn = \sum_{k=1}^n \frac{1}{k} = \Theta(\log n).
  • Resulting asymptotic: the average-case time for QuickSort is \mathbb{E}[C_n] = \Theta(n \log n).
  • Best-case vs. worst-case (context from the transcript):
    • Best case occurs when partitions are perfectly balanced at every step, yielding height ~ log_2 n and overall time ~ n log n (Θ(n log n)).
    • Worst case occurs when the pivot produces highly unbalanced partitions at every step, yielding Θ(n^2) comparisons.
  • Important modeling caveats:
    • The 2/(j - i + 1) probability relies on a random-pivot and random-input model; changing the distribution (e.g., non-random pivot or skewed input) changes the constants and sometimes the growth rate.
  • Takeaway: under reasonable randomness assumptions, QuickSort runs in O(n log n) on average; the exact constant depends on the distribution of pivots and inputs.

Two-Level Probability, Distribution Choices, and Practical Takeaways

  • The transcript repeatedly emphasizes that average-case results depend on the chosen probability model.
    • You may have a distribution over inputs (e.g., a random permutation of keys) and, within that, a distribution over the algorithm’s random choices (e.g., random pivots in QuickSort).
    • In some problems, you may have a two-level distribution: P(A) for input sets A1, A2, …, plus P(event | Ai) within each Ai.
  • Formula for partitioning expectation when the sample space is partitioned:
    \mathbb{E}[X] = \sum{i} \mathbb{E}[X \mid Ai] \cdot P(A_i).
  • Practical lesson: do not just memorize a single average-case result; understand the problem’s distribution and build a math model that reflects the application. Then derive the expectation accordingly.
  • A common pitfall highlighted: assuming independence when applying linearity of expectation can be unnecessary or incorrect; linearity holds regardless of independence, which is why you can sum expectations of indicator variables even when they are not independent.

Summary of Key Formulas and Concepts (LaTeX)

  • Expectation over a finite sample space:
    \mathbb{E}[X] = \sum_{e \in S} P(e) \cdot x(e).
  • Linear search average (assuming s is in the array at a random position 1..n):
    \mathbb{E}[\text{comparisons}] = \frac{1}{n} \sum_{i=1}^n i = \frac{n+1}{2} = \Theta(n).
  • Card shuffling (fixed points) expectation:
    \mathbb{E}[X] = \sum{i=1}^n \mathbb{E}[Xi] = \sum{i=1}^n \Pr(Xi=1) = n \cdot \frac{1}{n} = 1.
  • Inversions in insertion sort (average):
    • Number of inversions
      I = \sum{1 \le i < j \le n} X{ij},\quad X{ij}=\begin{cases}1 & \text{if } A[i] > A[j],\0 & \text{otherwise.}\end{cases} \Pr(X{ij}=1)=\tfrac{1}{2} \quad (\text{random permutation}),
      \mathbb{E}[I] = \binom{n}{2} \cdot \tfrac{1}{2} = \frac{n(n-1)}{4}.
  • QuickSort average-number of comparisons (indicator form):
    Cn = \sum{1 \le i < j \le n} X{i j},\quad \Pr(X{i j}=1) = \frac{2}{j-i+1},
    \mathbb{E}[Cn] = \sum{1 \le i < j \le n} \frac{2}{j-i+1} = 2 \sum{k=2}^n \frac{n-k+1}{k} = 2n Hn - 2(n-1) = \Theta(n \log n).
  • Harmonic numbers and growth:
    Hn = \sum{k=1}^n \frac{1}{k} = \Theta(\log n).
  • Large-picture takeaways:
    • Linear search: average time Theta(n) under a simple position-uniform model.
    • Insertion sort: average number of inversions is n(n-1)/4; average time Theta(n^2).
    • QuickSort: average time Theta(n log n) under random-pivot/random-input assumptions.
    • The chosen probability model dramatically influences constants; asymptotic growth often remains the same across reasonable models, but not always.