Notes: Expectation, Linear Search, Insertion Sort, Shuffle, and QuickSort (Probabilistic Analysis)
Expectation, Distributions, and Algorithm Analysis
- The transcript emphasizes building intuition for expectation (E[X]) and how it depends on the underlying probability space.
- Core definition:
- Suppose there is a probability distribution over the events in a sample space S, and a random variable x that assigns a value to each event e ∈ S. Then the expectation (mean) is
\mathbb{E}[X] \equiv \, \sum_{e \in S} P(e) \cdot x(e) - The value of the expectation depends on the dimension of the variable; here we focus on a single variable (dimension 1). If the variable were a vector, the mean would be taken per dimension.
- Important modeling note:
- There is always an assumed probability distribution over the events in S: each event has probability in [0,1] and the probabilities sum to 1.
- If you change the distribution, the computed expectation changes. The choice of distribution should reflect the application.
- A single variable can be modeled by a function-like notation x, though it is not literally a function; X is determined by which event occurs and the value x takes for that event.
- If you have multiple variables, you typically model their joint behavior with a probability space; the “mean value” concept extends to each dimension, but the simplifications in the transcript assume a single scalar variable.
- The transcript cautions that average analysis (e.g., average running time) depends on the assumed probability distribution; uniform distribution is a common teaching example, but may not reflect real usage.
- When the event space is partitioned into disjoint sets A1, A2, …, Am, with probabilities P(Ai) and conditional expectations E[X | Ai], then the overall expectation can be computed by
\mathbb{E}[X] = \sum{i=1}^m \mathbb{E}[X \mid Ai] \cdot P(A_i). - The materials illustrate that average-case analyses can be performed in multiple ways and may yield different constants but the same asymptotic growth, depending on the distribution assumptions.
- Big picture takeaway: use probability models that reflect the problem, then apply linearity of expectation (where applicable) to compute averages; independence is not always required for summing expectations.
Linear Search: Best, Worst, and Average Running Time
- Problem setup (linear search): Given an array A of finite size n and a target s, return an index where s occurs or a sentinel value if not present.
- Return value conventions in the transcript: an index for found, and a negative value (e.g., -1) if not in the array.
- Best-case running time: the first element is s.
- Time complexity: \Theta(1)
- Worst-case running time: s is at the last position or not present at all.
- Time complexity: \Theta(n)
- Average-case running time (intuition under a simple model): if s is guaranteed to be in the array and is equally likely to be at any position 1 through n, the number of comparisons is uniformly distributed over {1, 2, …, n}.
- Expected number of comparisons:
\mathbb{E}[\text{comparisons}] = \sum{i=1}^n i \cdot \Pr[\text{position}=i] = \frac{1}{n} \sum{i=1}^n i = \frac{n+1}{2} - Therefore, average time is \Theta(n).
- Note on not-possible-to-know-without-distribution caveat:
- The simplistic half-the-elements intuition (average ≈ n/2) hinges on a uniform distribution of the target’s position. If the distribution is different, the average changes.
- If you also include the possibility that s is not in A, you must add the probability of the “not found” case and adjust the expectation accordingly.
- Key learning: average running time is not universally determined by best/worst cases alone; you must specify the underlying probability distribution of where the target lies.
Card Shuffling (Derangements) and the Fix Points Expectation
- Modeling a random shuffle as a random permutation of n cards.
- Define indicator variables for fixed points:
- Let X_i = 1 if card i ends in its original position after the shuffle, and 0 otherwise, for i = 1, …, n.
- The total number of cards staying in their original positions is
X = \sum{i=1}^n Xi. - Uniform random permutation implies symmetry:
- For each i, the probability that card i stays in its original position is
\Pr(X_i = 1) = \frac{1}{n}. - Hence, \mathbb{E}[X_i] = \frac{1}{n}.
- By linearity of expectation (note: independence is not required for linearity of expectation), the expected number of fixed points is
\mathbb{E}[X] = \sum{i=1}^n \mathbb{E}[Xi] = n \cdot \frac{1}{n} = 1. - Important correction to a common pitfall in the transcript:
- The calculation above does not require X_i to be independent. Lineararity of expectation holds regardless of independence.
- Real-world note: the distribution of fixed points in a random permutation approaches a Poisson(1) distribution in the limit as n grows large.
- This example illustrates how to set up a probabilistic model (indicator random variables) and use linearity to obtain the expectation of a sum, even when dependencies exist.
Insertion Sort: Inversions, Best/Worst, and Average Case
- Key idea: the number of swaps/shifts performed by insertion sort is determined by the number of inversions in the input array.
- Definitions:
- An inversion is a pair (i, j) with i < j and A[i] > A[j].
- The total number of inversions I in an array of size n is at most C(n, 2) = n(n-1)/2 (worst case when array is in reverse order).
- Worst-case runtime (inversions drive work): if every pair forms an inversion (reverse-sorted input), the algorithm performs O(n^2) work.
- Specifically, worst-case time is Θ(n^2).
- Average-case runtime (random permutation model): for a random permutation of n distinct elements, any given pair (i, j) is inverted with probability 1/2.
- Number of possible pairs: C(n, 2) = n(n-1)/2.
- Expected number of inversions:
\mathbb{E}[I] = \sum_{1 \le i < j \le n} \Pr(A[i] > A[j]) = \binom{n}{2} \cdot \tfrac{1}{2} = \frac{n(n-1)}{4}.
- Therefore, the average running time of insertion sort is Θ(n^2) with a leading constant of 1/4 for the number of inversions driving the work.
- Alternative presentation (counting argument): the total number of possible inversions is C(n, 2); if each is inverted with probability 1/2, the expected inversions is C(n, 2)/2 = n(n-1)/4.
- Takeaway: there are multiple ways to compute the same average, and they yield the same asymptotic growth (Θ(n^2)) but may give different constants depending on modeling choices.
QuickSort: Partitioning, Pivot, and Average Number of Comparisons
- Core operations: QuickSort partitions an array around a pivot, recursively sorting the left and right subarrays.
- Notation and ideas used in the transcript:
- Let Z_{i j} denote the subarray from index i to j (inclusive).
- Define X_{i j} as an indicator variable for whether elements at positions i and j are compared during the partitioning process.
- Relationship to total work:
- The total number of comparisons in QuickSort is
Cn = \sum{1 \le i < j \le n} X_{i j}. - If we know the distribution of X{i j}, we can compute the expectation of Cn.
- Key probability for pairs being compared under a commonly used random-pivot assumption:
- For i, j with i < j, the probability that elements i and j are compared is
\Pr(X_{i j} = 1) = \frac{2}{j - i + 1}. - This arises from considering the subarray Z_{i j} and the random choice of the first pivot among the j - i + 1 elements in that subarray. The two extreme cases (pivot at i or pivot at j) lead to those two comparisons, yielding the 2/(j-i+1) probability after accounting for symmetry and recursion.
- Expected number of comparisons (average-case analysis):
\mathbb{E}[Cn] = \sum{1 \le i < j \le n} \Pr(X{i j}=1) = \sum{1 \le i < j \le n} \frac{2}{j-i+1}. - Evaluating the sum (outline):
- Let k = j - i + 1, so k ranges from 2 to n for each i, and the number of pairs with a given k is n - k + 1.
- Then
\mathbb{E}[Cn] = 2 \sum{k=2}^n \frac{n - k + 1}{k} = 2n Hn - 2(n-1) = \Theta(n \log n),
where Hn = \sum_{k=1}^n \frac{1}{k} = \Theta(\log n).
- Resulting asymptotic: the average-case time for QuickSort is \mathbb{E}[C_n] = \Theta(n \log n).
- Best-case vs. worst-case (context from the transcript):
- Best case occurs when partitions are perfectly balanced at every step, yielding height ~ log_2 n and overall time ~ n log n (Θ(n log n)).
- Worst case occurs when the pivot produces highly unbalanced partitions at every step, yielding Θ(n^2) comparisons.
- Important modeling caveats:
- The 2/(j - i + 1) probability relies on a random-pivot and random-input model; changing the distribution (e.g., non-random pivot or skewed input) changes the constants and sometimes the growth rate.
- Takeaway: under reasonable randomness assumptions, QuickSort runs in O(n log n) on average; the exact constant depends on the distribution of pivots and inputs.
Two-Level Probability, Distribution Choices, and Practical Takeaways
- The transcript repeatedly emphasizes that average-case results depend on the chosen probability model.
- You may have a distribution over inputs (e.g., a random permutation of keys) and, within that, a distribution over the algorithm’s random choices (e.g., random pivots in QuickSort).
- In some problems, you may have a two-level distribution: P(A) for input sets A1, A2, …, plus P(event | Ai) within each Ai.
- Formula for partitioning expectation when the sample space is partitioned:
\mathbb{E}[X] = \sum{i} \mathbb{E}[X \mid Ai] \cdot P(A_i). - Practical lesson: do not just memorize a single average-case result; understand the problem’s distribution and build a math model that reflects the application. Then derive the expectation accordingly.
- A common pitfall highlighted: assuming independence when applying linearity of expectation can be unnecessary or incorrect; linearity holds regardless of independence, which is why you can sum expectations of indicator variables even when they are not independent.
- Expectation over a finite sample space:
\mathbb{E}[X] = \sum_{e \in S} P(e) \cdot x(e). - Linear search average (assuming s is in the array at a random position 1..n):
\mathbb{E}[\text{comparisons}] = \frac{1}{n} \sum_{i=1}^n i = \frac{n+1}{2} = \Theta(n). - Card shuffling (fixed points) expectation:
\mathbb{E}[X] = \sum{i=1}^n \mathbb{E}[Xi] = \sum{i=1}^n \Pr(Xi=1) = n \cdot \frac{1}{n} = 1. - Inversions in insertion sort (average):
- Number of inversions
I = \sum{1 \le i < j \le n} X{ij},\quad X{ij}=\begin{cases}1 & \text{if } A[i] > A[j],\0 & \text{otherwise.}\end{cases}
\Pr(X{ij}=1)=\tfrac{1}{2} \quad (\text{random permutation}),
\mathbb{E}[I] = \binom{n}{2} \cdot \tfrac{1}{2} = \frac{n(n-1)}{4}.
- QuickSort average-number of comparisons (indicator form):
Cn = \sum{1 \le i < j \le n} X{i j},\quad \Pr(X{i j}=1) = \frac{2}{j-i+1},
\mathbb{E}[Cn] = \sum{1 \le i < j \le n} \frac{2}{j-i+1} = 2 \sum{k=2}^n \frac{n-k+1}{k} = 2n Hn - 2(n-1) = \Theta(n \log n). - Harmonic numbers and growth:
Hn = \sum{k=1}^n \frac{1}{k} = \Theta(\log n). - Large-picture takeaways:
- Linear search: average time Theta(n) under a simple position-uniform model.
- Insertion sort: average number of inversions is n(n-1)/4; average time Theta(n^2).
- QuickSort: average time Theta(n log n) under random-pivot/random-input assumptions.
- The chosen probability model dramatically influences constants; asymptotic growth often remains the same across reasonable models, but not always.