Week 7

Expectation and Linear Transformations

Let X be a continuous random variable with probability density function (pdf) f(x).
For any real-valued function g, the expectation is given by
$E[g(X)] = \int_{-\infty}^{\infty} g(x) f(x) \, dx.$
Special case: if Y = aX + b, then
$E[aX + b] = a E[X] + b,$
since
$E[aX + b] = \int{-\infty}^{\infty} (a x + b) f(x) \, dx = a \int{-\infty}^{\infty} x f(x) \, dx + b \int_{-\infty}^{\infty} f(x) \, dx = a E[X] + b.$

Nonnegative g(X): an alternative expression for E[g(X)]

Theorem (nonnegative g): if g(x) ≥ 0 for all x, then
E[g(X)] = \int_{0}^{\infty} \Pr{g(X) > y} \, dy.
Sketch of proof:
- For nonnegative Y with density fY, we have E[Y] = \int{0}^{\infty} \Pr{Y > y} \, dy.
- Since \Pr{Y > y} = \int{y}^{\infty} fY(x) \, dx,
  interchanging integrals (Tonelli/Fubini) yields
  E[Y] = \int{0}^{\infty} \int{x:g(x) > y} f(x) \, dx \, dy = \int_{-\infty}^{\infty} g(x) f(x) \, dx.
This justifies using the tail integral representation for nonnegative functions of X.

Example: E[e^X] for X ~ Uniform(0,1)

Density: $f(x) = \begin{cases}1, & 0 \le x \le 1, \\ 0, & \text{otherwise}.\end{cases}$
Directly,
$E[e^X] = \int_{0}^{1} e^{x} \, dx = e - 1.$
Alternative method via Y = e^X:
- CDF: for 1 ≤ x ≤ e,
 $FY(x) = \Pr(Y \le x) = \Pr(e^{X} \le x) = \Pr(X \le \log x) = \int{0}^{\log x} f(t) \, dt = \log x.$
- Therefore pdf:
 $fY(x) = (FY(x))' = \frac{1}{x}, \quad 1 \le x \le e.$
- Expectation:
 $E[Y] = \int{1}^{e} x \cdot \frac{1}{x} \, dx = \int{1}^{e} 1 \, dx = e - 1.$
Result: $E[e^{X}] = e - 1$ (same result, different method).

Normal Approximation to Binomial Distributions (DeMoivre–Laplace)

Let $Sn \sim \text{Bin}(n, p).$ For any real numbers a < b, $\lim{n \to \infty} \Pr\left(a \le \frac{Sn - np}{\sqrt{np(1-p)}} \le b\right) = \Phi(b) - \Phi(a),$ where $\Phi(z) = \frac{1}{\sqrt{2\pi}} \int{-\infty}^{z} e^{-t^2/2} \; dt.$
Binomial approximations:
- Poisson approximation when n is large and p is small: if n p is moderate, use Poisson with parameter $\lambda = np$.
- Normal approximation is good when $np(1-p)$ is large, typically ≥ 10.

Examples: Binomial approximations and exact calculations

Example 1: 40 fair coin tosses, X = number of heads. Find $\Pr{X = 20}.$
- Continuity correction: \Pr{19.5 < X < 20.5}.
- Standardize: $\frac{19.5 - 20}{\sqrt{40 \cdot 0.5 \cdot 0.5}} = \frac{-0.5}{\sqrt{10}} \approx -0.158,$
 and similarly for 20.5. Thus
 $\Pr{X=20} \approx \Phi(0.158) - \Phi(-0.158) = 2\Phi(0.158) - 1 \approx 0.1272.$
- Exact value: $\Pr{X=20} = {40 \choose 20} 2^{-40} \approx 0.1254.$
Example 2: Class size, 450 offers, each attending with probability 0.3. X ~ Bin(450, 0.3). Compute $\Pr{X \ge 151}.$
- Use continuity correction: $\Pr{X \ge 151} = \Pr{X \ge 150.5}.$
- With np = 450×0.3 = 135 and Var = 450×0.3×0.7 = 94.5, SD = \sqrt{94.5} \approx 9.725.
- Z-value: $Z = \frac{150.5 - 135}{\sqrt{94.5}} \approx \frac{15.5}{9.725} \approx 1.59.$
- Tail probability: $\Pr{Z \ge 1.59} \approx 1 - \Phi(1.59) \approx 0.0559.$
Example 3: In 10,000 coin tosses, observed heads = 5800. Is the coin fair?
- If X ~ Bin(10000, 0.5), then E[X] = 5000, Var(X) = 2500, SD = 50.
- Continuity-corrected: $\Pr{X \ge 5800} = \Pr{X \ge 5799.5}.$
- Standardize: $Z = \frac{5799.5 - 5000}{50} = \frac{799.5}{50} = 15.99.$
- Result: probability is effectively 0; thus the coin is not fair in practical terms.

Exponential Distribution

Density:
f(x) = \begin{cases} \lambda e^{-\lambda x}, & x \ge 0, \[2pt] 0, & x < 0. \end{cases}
CDF:
$F(x) = \Pr(X \le x) = \begin{cases} 1- e^{-\lambda x}, & x \ge 0, \\ 0, & x < 0. \end{cases}$
Survival function: \Pr(X > x) = 1 - F(x) = e^{-\lambda x}, \quad x \ge 0.
Parameter: \lambda > 0.
Moments: If $X \sim \text{Exp}(\lambda)$ , then
$E[X] = \frac{1}{\lambda}, \quad \operatorname{Var}(X) = \frac{1}{\lambda^{2}}.$
Recurrence for moments: $E[X^{n}] = \frac{n}{\lambda} E[X^{n-1}].$
Memoryless property: Exponential is memoryless:
$\Pr{T \ge t+s \,|\, T \ge s} = \Pr{T \ge t} = e^{-\lambda t}.$
Practical note: Not a good model for human lifetime (memorylessness implies equal remaining life for people at different ages).
Relation to Poisson process: If events occur as a Poisson process with rate $\lambda$, then the time until the next event T is Exp($\lambda$), with tail $\Pr(T > t) = e^{-\lambda t}$.

Gamma Distribution

Motivation: Waiting time until the n-th event in a Poisson process. Let T_n be this waiting time.
CDF:
$F{Tn}(t) = \Pr(Tn \le t) = \Pr( \text{at least } n \text{ events in } [0,t] ) = \sum{k=n}^{\infty} \frac{(\lambda t)^{k}}{k!} e^{-\lambda t} = 1 - \sum_{k=0}^{n-1} \frac{(\lambda t)^{k}}{k!} e^{-\lambda t}.$
PDF (Gamma density): for shape $\alpha > 0$ and rate $\lambda > 0$,
g(t) = \frac{\lambda^{\alpha}}{\Gamma(\alpha)} t^{\alpha - 1} e^{-\lambda t}, \quad t > 0.
When $\alpha = 1$, Gamma reduces to the Exponential distribution: $\text{Exp}(\lambda) = \Gamma(1, \lambda).$
Interpretation: The waiting time until the $\alpha$-th event is Gamma with parameters $(\alpha, \lambda)$; the Gamma variable is the sum of $\alpha$ i.i.d. Exp($\lambda$) variables (only strictly true for integer $\alpha$).
Applications: modeling arrival times in Poisson processes, time to failures, positive-valued quantities (rainfalls), and financial quantities like insurance claim sizes.
Gamma function connection: The gamma function is defined as \Gamma(x) = \int_{0}^{\infty} u^{x-1} e^{-u} \, du, \quad x > 0.
- It satisfies the recursive relation
  $\Gamma(x) = (x-1) \Gamma(x-1).$
- For positive integers, $\Gamma(n) = (n-1)!.$
Gamma density with two parameters: shape $\alpha > 0$ and rate $\lambda > 0$ is often written as
g(t) = \frac{\lambda^{\alpha}}{\Gamma(\alpha)} t^{\alpha - 1} e^{-\lambda t}, \quad t > 0.
Mean and variance of Gamma$(\alpha, \lambda)$:
- $E[X] = \frac{\alpha}{\lambda}, \quad \operatorname{Var}(X) = \frac{\alpha}{\lambda^{2}}.$
Important special cases and interpretations:
- If $X \sim \Gamma(n, \lambda)$ with integer $n$, then $X$ is the sum of $n$ i.i.d. Exp($\lambda$) variables.
- The Gamma distribution generalizes the exponential distribution; when $\alpha=1$ we recover Exp($\lambda$).
Example in seismology: times between microearthquakes may be modeled with Gamma vs Exponential; Gamma often provides better fit when inter-event times exhibit variability beyond Poisson assumptions.

Additional notes on Gamma and related concepts

If $X \sim \Gamma(\alpha, \lambda)$, then the case $\alpha = n$ (a positive integer) corresponds to the sum of $n$ independent Exp($\lambda$) random variables.
The gamma function appears in the normalization constant of the gamma density; it generalizes the factorial to non-integer values.
In practice, the gamma distribution is used for positive-valued data with skewness controlled by the shape parameter $\alpha$; larger $\alpha$ yields more symmetric (approximately normal) shapes when scaled appropriately.

Quick recap and connections to foundational principles

Linearity of expectation: expectations of sums are sums of expectations; transformations of random variables preserve linear relationships under expectation.
Integral representations and Fubini/Tonelli theorem allow swapping order of integration, enabling tail-integral representations.
Normal approximation to binomial relies on central limit intuition: binomial counts behave like normal distributions under appropriate scaling.
Exponential distribution embodies memoryless property, linking to Poisson processes and waiting-time problems.
Gamma distribution connects to Poisson processes as the waiting time to the n-th event; its properties extend the exponential case to sums of independent exponentials.

Key formulas to remember (LaTeX)

Expectation of a transformation:
$E[g(X)] = \int_{-\infty}^{\infty} g(x) f(x) \, dx.$
Linear transformation of X:
$E[aX + b] = a E[X] + b.$
Tail-integral representation (for nonnegative g):
E[g(X)] = \int_{0}^{\infty} \Pr{g(X) > y} \, dy.
Normal approximation to Binomial (DeMoivre–Laplace):
$\lim{n \to \infty} \Pr\left(a \le \frac{Sn - np}{\sqrt{np(1-p)}} \le b\right) = \Phi(b) - \Phi(a).$
Exponential distribution:
f(x) = \lambda e^{-\lambda x} (x \ge 0), \quad F(x) = 1 - e^{-\lambda x} (x \ge 0), \quad \Pr(X > x) = e^{-\lambda x}.
Mean and variance of Exp($\lambda$):
$E[X] = \frac{1}{\lambda}, \quad \operatorname{Var}(X) = \frac{1}{\lambda^{2}}.$
Memoryless property (Exponential):
$\Pr{T \ge t+s \mid T \ge s} = \Pr{T \ge t} = e^{-\lambda t}.$
Gamma density (shape $\alpha$, rate $\lambda$):
g(t) = \frac{\lambda^{\alpha}}{\Gamma(\alpha)} t^{\alpha - 1} e^{-\lambda t}, \quad t > 0.
Gamma function:
\Gamma(x) = \int_{0}^{\infty} u^{x-1} e^{-u} \, du, \quad x > 0,
$\Gamma(x) = (x-1)\Gamma(x-1), \quad \Gamma(n) = (n-1)! \ (n \in \mathbb{N}).$
Gamma moments: for $X \sim \Gamma(\alpha, \lambda)$,
$E[X] = \frac{\alpha}{\lambda}, \quad \operatorname{Var}(X) = \frac{\alpha}{\lambda^{2}}.$