Notes: Expectation, Linear Search, Insertion Sort, Shuffle, and QuickSort (Probabilistic Analysis)

Expectation, Distributions, and Algorithm Analysis

The transcript emphasizes building intuition for expectation (E[X]) and how it depends on the underlying probability space.
Core definition:
- Suppose there is a probability distribution over the events in a sample space S, and a random variable x that assigns a value to each event e ∈ S. Then the expectation (mean) is
  \mathbb{E}[X] \equiv \, \sum_{e \in S} P(e) \cdot x(e)
- The value of the expectation depends on the dimension of the variable; here we focus on a single variable (dimension 1). If the variable were a vector, the mean would be taken per dimension.
Important modeling note:
- There is always an assumed probability distribution over the events in S: each event has probability in [0,1] and the probabilities sum to 1.
- If you change the distribution, the computed expectation changes. The choice of distribution should reflect the application.
A single variable can be modeled by a function-like notation x, though it is not literally a function; X is determined by which event occurs and the value x takes for that event.
If you have multiple variables, you typically model their joint behavior with a probability space; the “mean value” concept extends to each dimension, but the simplifications in the transcript assume a single scalar variable.
The transcript cautions that average analysis (e.g., average running time) depends on the assumed probability distribution; uniform distribution is a common teaching example, but may not reflect real usage.
When the event space is partitioned into disjoint sets A1, A2, …, Am, with probabilities P(Ai) and conditional expectations E[X | Ai], then the overall expectation can be computed by
\mathbb{E}[X] = \sum{i=1}^m \mathbb{E}[X \mid Ai] \cdot P(A_i).
The materials illustrate that average-case analyses can be performed in multiple ways and may yield different constants but the same asymptotic growth, depending on the distribution assumptions.
Big picture takeaway: use probability models that reflect the problem, then apply linearity of expectation (where applicable) to compute averages; independence is not always required for summing expectations.

Linear Search: Best, Worst, and Average Running Time

Problem setup (linear search): Given an array A of finite size n and a target s, return an index where s occurs or a sentinel value if not present.
- Return value conventions in the transcript: an index for found, and a negative value (e.g., -1) if not in the array.
Best-case running time: the first element is s.
- Time complexity: \Theta(1)
Worst-case running time: s is at the last position or not present at all.
- Time complexity: \Theta(n)
Average-case running time (intuition under a simple model): if s is guaranteed to be in the array and is equally likely to be at any position 1 through n, the number of comparisons is uniformly distributed over {1, 2, …, n}.
- Expected number of comparisons:
  \mathbb{E}[\text{comparisons}] = \sum{i=1}^n i \cdot \Pr[\text{position}=i] = \frac{1}{n} \sum{i=1}^n i = \frac{n+1}{2}
- Therefore, average time is \Theta(n).
Note on not-possible-to-know-without-distribution caveat:
- The simplistic half-the-elements intuition (average ≈ n/2) hinges on a uniform distribution of the target’s position. If the distribution is different, the average changes.
- If you also include the possibility that s is not in A, you must add the probability of the “not found” case and adjust the expectation accordingly.
Key learning: average running time is not universally determined by best/worst cases alone; you must specify the underlying probability distribution of where the target lies.

Card Shuffling (Derangements) and the Fix Points Expectation

Modeling a random shuffle as a random permutation of n cards.
Define indicator variables for fixed points:
- Let X_i = 1 if card i ends in its original position after the shuffle, and 0 otherwise, for i = 1, …, n.
The total number of cards staying in their original positions is
X = \sum{i=1}^n Xi.
Uniform random permutation implies symmetry:
- For each i, the probability that card i stays in its original position is
  \Pr(X_i = 1) = \frac{1}{n}.
- Hence, \mathbb{E}[X_i] = \frac{1}{n}.
By linearity of expectation (note: independence is not required for linearity of expectation), the expected number of fixed points is
\mathbb{E}[X] = \sum{i=1}^n \mathbb{E}[Xi] = n \cdot \frac{1}{n} = 1.
Important correction to a common pitfall in the transcript:
- The calculation above does not require X_i to be independent. Lineararity of expectation holds regardless of independence.
Real-world note: the distribution of fixed points in a random permutation approaches a Poisson(1) distribution in the limit as n grows large.
This example illustrates how to set up a probabilistic model (indicator random variables) and use linearity to obtain the expectation of a sum, even when dependencies exist.

Insertion Sort: Inversions, Best/Worst, and Average Case

Key idea: the number of swaps/shifts performed by insertion sort is determined by the number of inversions in the input array.
Definitions:
- An inversion is a pair (i, j) with i < j and A[i] > A[j].
- The total number of inversions I in an array of size n is at most C(n, 2) = n(n-1)/2 (worst case when array is in reverse order).
Worst-case runtime (inversions drive work): if every pair forms an inversion (reverse-sorted input), the algorithm performs O(n^2) work.
- Specifically, worst-case time is Θ(n^2).
Average-case runtime (random permutation model): for a random permutation of n distinct elements, any given pair (i, j) is inverted with probability 1/2.
- Number of possible pairs: C(n, 2) = n(n-1)/2.
- Expected number of inversions:
  \mathbb{E}[I] = \sum_{1 \le i < j \le n} \Pr(A[i] > A[j]) = \binom{n}{2} \cdot \tfrac{1}{2} = \frac{n(n-1)}{4}.
Therefore, the average running time of insertion sort is Θ(n^2) with a leading constant of 1/4 for the number of inversions driving the work.
Alternative presentation (counting argument): the total number of possible inversions is C(n, 2); if each is inverted with probability 1/2, the expected inversions is C(n, 2)/2 = n(n-1)/4.
Takeaway: there are multiple ways to compute the same average, and they yield the same asymptotic growth (Θ(n^2)) but may give different constants depending on modeling choices.

QuickSort: Partitioning, Pivot, and Average Number of Comparisons

Core operations: QuickSort partitions an array around a pivot, recursively sorting the left and right subarrays.
Notation and ideas used in the transcript:
- Let Z_{i j} denote the subarray from index i to j (inclusive).
- Define X_{i j} as an indicator variable for whether elements at positions i and j are compared during the partitioning process.
Relationship to total work:
- The total number of comparisons in QuickSort is
  Cn = \sum{1 \le i < j \le n} X_{i j}.
- If we know the distribution of X{i j}, we can compute the expectation of Cn.
Key probability for pairs being compared under a commonly used random-pivot assumption:
- For i, j with i < j, the probability that elements i and j are compared is
  \Pr(X_{i j} = 1) = \frac{2}{j - i + 1}.
- This arises from considering the subarray Z_{i j} and the random choice of the first pivot among the j - i + 1 elements in that subarray. The two extreme cases (pivot at i or pivot at j) lead to those two comparisons, yielding the 2/(j-i+1) probability after accounting for symmetry and recursion.
Expected number of comparisons (average-case analysis):
\mathbb{E}[Cn] = \sum{1 \le i < j \le n} \Pr(X{i j}=1) = \sum{1 \le i < j \le n} \frac{2}{j-i+1}.
Evaluating the sum (outline):
- Let k = j - i + 1, so k ranges from 2 to n for each i, and the number of pairs with a given k is n - k + 1.
- Then
  \mathbb{E}[Cn] = 2 \sum{k=2}^n \frac{n - k + 1}{k} = 2n Hn - 2(n-1) = \Theta(n \log n), where Hn = \sum_{k=1}^n \frac{1}{k} = \Theta(\log n).
Resulting asymptotic: the average-case time for QuickSort is \mathbb{E}[C_n] = \Theta(n \log n).
Best-case vs. worst-case (context from the transcript):
- Best case occurs when partitions are perfectly balanced at every step, yielding height ~ log_2 n and overall time ~ n log n (Θ(n log n)).
- Worst case occurs when the pivot produces highly unbalanced partitions at every step, yielding Θ(n^2) comparisons.
Important modeling caveats:
- The 2/(j - i + 1) probability relies on a random-pivot and random-input model; changing the distribution (e.g., non-random pivot or skewed input) changes the constants and sometimes the growth rate.
Takeaway: under reasonable randomness assumptions, QuickSort runs in O(n log n) on average; the exact constant depends on the distribution of pivots and inputs.

Two-Level Probability, Distribution Choices, and Practical Takeaways

The transcript repeatedly emphasizes that average-case results depend on the chosen probability model.
- You may have a distribution over inputs (e.g., a random permutation of keys) and, within that, a distribution over the algorithm’s random choices (e.g., random pivots in QuickSort).
- In some problems, you may have a two-level distribution: P(A) for input sets A1, A2, …, plus P(event | Ai) within each Ai.
Formula for partitioning expectation when the sample space is partitioned:
\mathbb{E}[X] = \sum{i} \mathbb{E}[X \mid Ai] \cdot P(A_i).
Practical lesson: do not just memorize a single average-case result; understand the problem’s distribution and build a math model that reflects the application. Then derive the expectation accordingly.
A common pitfall highlighted: assuming independence when applying linearity of expectation can be unnecessary or incorrect; linearity holds regardless of independence, which is why you can sum expectations of indicator variables even when they are not independent.

Summary of Key Formulas and Concepts (LaTeX)

Expectation over a finite sample space:
\mathbb{E}[X] = \sum_{e \in S} P(e) \cdot x(e).
Linear search average (assuming s is in the array at a random position 1..n):
\mathbb{E}[\text{comparisons}] = \frac{1}{n} \sum_{i=1}^n i = \frac{n+1}{2} = \Theta(n).
Card shuffling (fixed points) expectation:
\mathbb{E}[X] = \sum{i=1}^n \mathbb{E}[Xi] = \sum{i=1}^n \Pr(Xi=1) = n \cdot \frac{1}{n} = 1.
Inversions in insertion sort (average):
- Number of inversions
  I = \sum{1 \le i < j \le n} X{ij},\quad X{ij}=\begin{cases}1 & \text{if } A[i] > A[j],\0 & \text{otherwise.}\end{cases} \Pr(X{ij}=1)=\tfrac{1}{2} \quad (\text{random permutation}),
  \mathbb{E}[I] = \binom{n}{2} \cdot \tfrac{1}{2} = \frac{n(n-1)}{4}.
QuickSort average-number of comparisons (indicator form):
Cn = \sum{1 \le i < j \le n} X{i j},\quad \Pr(X{i j}=1) = \frac{2}{j-i+1},
\mathbb{E}[Cn] = \sum{1 \le i < j \le n} \frac{2}{j-i+1} = 2 \sum{k=2}^n \frac{n-k+1}{k} = 2n Hn - 2(n-1) = \Theta(n \log n).
Harmonic numbers and growth:
Hn = \sum{k=1}^n \frac{1}{k} = \Theta(\log n).
Large-picture takeaways:
- Linear search: average time Theta(n) under a simple position-uniform model.
- Insertion sort: average number of inversions is n(n-1)/4; average time Theta(n^2).
- QuickSort: average time Theta(n log n) under random-pivot/random-input assumptions.
- The chosen probability model dramatically influences constants; asymptotic growth often remains the same across reasonable models, but not always.