Week 7

Expectation and Linear Transformations

  • Let X be a continuous random variable with probability density function (pdf) f(x).
  • For any real-valued function g, the expectation is given by
    E[g(X)]=g(x)f(x)dx.E[g(X)] = \int_{-\infty}^{\infty} g(x) f(x) \, dx.
  • Special case: if Y = aX + b, then
    E[aX+b]=aE[X]+b,E[aX + b] = a E[X] + b,
    since
    E[aX+b]=<em>(ax+b)f(x)dx=a</em>xf(x)dx+bf(x)dx=aE[X]+b.E[aX + b] = \int<em>{-\infty}^{\infty} (a x + b) f(x) \, dx = a \int</em>{-\infty}^{\infty} x f(x) \, dx + b \int_{-\infty}^{\infty} f(x) \, dx = a E[X] + b.

Nonnegative g(X): an alternative expression for E[g(X)]

  • Theorem (nonnegative g): if g(x) ≥ 0 for all x, then
    E[g(X)] = \int_{0}^{\infty} \Pr{g(X) > y} \, dy.
  • Sketch of proof:
    • For nonnegative Y with density fY, we have E[Y] = \int{0}^{\infty} \Pr{Y > y} \, dy.
    • Since \Pr{Y > y} = \int{y}^{\infty} fY(x) \, dx,
      interchanging integrals (Tonelli/Fubini) yields
      E[Y] = \int{0}^{\infty} \int{x:g(x) > y} f(x) \, dx \, dy = \int_{-\infty}^{\infty} g(x) f(x) \, dx.
  • This justifies using the tail integral representation for nonnegative functions of X.

Example: E[e^X] for X ~ Uniform(0,1)

  • Density: f(x)={1,amp;0x1,0,amp;otherwise.f(x) = \begin{cases}1, &amp; 0 \le x \le 1, \\ 0, &amp; \text{otherwise}.\end{cases}
  • Directly,
    E[eX]=01exdx=e1.E[e^X] = \int_{0}^{1} e^{x} \, dx = e - 1.
  • Alternative method via Y = e^X:
    • CDF: for 1 ≤ x ≤ e,
      F<em>Y(x)=Pr(Yx)=Pr(eXx)=Pr(Xlogx)=</em>0logxf(t)dt=logx.F<em>Y(x) = \Pr(Y \le x) = \Pr(e^{X} \le x) = \Pr(X \le \log x) = \int</em>{0}^{\log x} f(t) \, dt = \log x.
    • Therefore pdf:
      f<em>Y(x)=(F</em>Y(x))=1x,1xe.f<em>Y(x) = (F</em>Y(x))' = \frac{1}{x}, \quad 1 \le x \le e.
    • Expectation:
      E[Y]=<em>1ex1xdx=</em>1e1dx=e1.E[Y] = \int<em>{1}^{e} x \cdot \frac{1}{x} \, dx = \int</em>{1}^{e} 1 \, dx = e - 1.
  • Result: E[eX]=e1E[e^{X}] = e - 1 (same result, different method).

Normal Approximation to Binomial Distributions (DeMoivre–Laplace)

  • Let S<em>nBin(n,p).S<em>n \sim \text{Bin}(n, p). For any real numbers a < b, lim</em>nPr(aS<em>nnpnp(1p)b)=Φ(b)Φ(a),\lim</em>{n \to \infty} \Pr\left(a \le \frac{S<em>n - np}{\sqrt{np(1-p)}} \le b\right) = \Phi(b) - \Phi(a), where Φ(z)=12π</em>zet2/2  dt.\Phi(z) = \frac{1}{\sqrt{2\pi}} \int</em>{-\infty}^{z} e^{-t^2/2} \; dt.
  • Binomial approximations:
    • Poisson approximation when n is large and p is small: if n p is moderate, use Poisson with parameter $\lambda = np$.
    • Normal approximation is good when np(1p)np(1-p) is large, typically ≥ 10.

Examples: Binomial approximations and exact calculations

  • Example 1: 40 fair coin tosses, X = number of heads. Find PrX=20.\Pr{X = 20}.
    • Continuity correction: \Pr{19.5 < X < 20.5}.
    • Standardize: 19.520400.50.5=0.5100.158,\frac{19.5 - 20}{\sqrt{40 \cdot 0.5 \cdot 0.5}} = \frac{-0.5}{\sqrt{10}} \approx -0.158,
      and similarly for 20.5. Thus
      PrX=20Φ(0.158)Φ(0.158)=2Φ(0.158)10.1272.\Pr{X=20} \approx \Phi(0.158) - \Phi(-0.158) = 2\Phi(0.158) - 1 \approx 0.1272.
    • Exact value: PrX=20=(4020)2400.1254.\Pr{X=20} = {40 \choose 20} 2^{-40} \approx 0.1254.
  • Example 2: Class size, 450 offers, each attending with probability 0.3. X ~ Bin(450, 0.3). Compute PrX151.\Pr{X \ge 151}.
    • Use continuity correction: PrX151=PrX150.5.\Pr{X \ge 151} = \Pr{X \ge 150.5}.
    • With np = 450×0.3 = 135 and Var = 450×0.3×0.7 = 94.5, SD = \sqrt{94.5} \approx 9.725.
    • Z-value: Z=150.513594.515.59.7251.59.Z = \frac{150.5 - 135}{\sqrt{94.5}} \approx \frac{15.5}{9.725} \approx 1.59.
    • Tail probability: PrZ1.591Φ(1.59)0.0559.\Pr{Z \ge 1.59} \approx 1 - \Phi(1.59) \approx 0.0559.
  • Example 3: In 10,000 coin tosses, observed heads = 5800. Is the coin fair?
    • If X ~ Bin(10000, 0.5), then E[X] = 5000, Var(X) = 2500, SD = 50.
    • Continuity-corrected: PrX5800=PrX5799.5.\Pr{X \ge 5800} = \Pr{X \ge 5799.5}.
    • Standardize: Z=5799.5500050=799.550=15.99.Z = \frac{5799.5 - 5000}{50} = \frac{799.5}{50} = 15.99.
    • Result: probability is effectively 0; thus the coin is not fair in practical terms.

Exponential Distribution

  • Density:
    f(x) = \begin{cases} \lambda e^{-\lambda x}, & x \ge 0, \[2pt] 0, & x < 0. \end{cases}
  • CDF:
    F(x)=Pr(Xx)={1eλx,amp;x0,0,amp;xlt;0.F(x) = \Pr(X \le x) = \begin{cases} 1- e^{-\lambda x}, &amp; x \ge 0, \\ 0, &amp; x &lt; 0. \end{cases}
  • Survival function: \Pr(X > x) = 1 - F(x) = e^{-\lambda x}, \quad x \ge 0.
  • Parameter: \lambda > 0.
  • Moments: If XExp(λ)X \sim \text{Exp}(\lambda), then
    E[X]=1λ,Var(X)=1λ2.E[X] = \frac{1}{\lambda}, \quad \operatorname{Var}(X) = \frac{1}{\lambda^{2}}.
  • Recurrence for moments: E[Xn]=nλE[Xn1].E[X^{n}] = \frac{n}{\lambda} E[X^{n-1}].
  • Memoryless property: Exponential is memoryless:
    PrTt+sTs=PrTt=eλt.\Pr{T \ge t+s \,|\, T \ge s} = \Pr{T \ge t} = e^{-\lambda t}.
  • Practical note: Not a good model for human lifetime (memorylessness implies equal remaining life for people at different ages).
  • Relation to Poisson process: If events occur as a Poisson process with rate $\lambda$, then the time until the next event T is Exp($\lambda$), with tail $\Pr(T > t) = e^{-\lambda t}$.

Gamma Distribution

  • Motivation: Waiting time until the n-th event in a Poisson process. Let T_n be this waiting time.
  • CDF:
    F<em>T</em>n(t)=Pr(T<em>nt)=Pr(at least n events in [0,t])=</em>k=n(λt)kk!eλt=1k=0n1(λt)kk!eλt.F<em>{T</em>n}(t) = \Pr(T<em>n \le t) = \Pr( \text{at least } n \text{ events in } [0,t] ) = \sum</em>{k=n}^{\infty} \frac{(\lambda t)^{k}}{k!} e^{-\lambda t} = 1 - \sum_{k=0}^{n-1} \frac{(\lambda t)^{k}}{k!} e^{-\lambda t}.
  • PDF (Gamma density): for shape $\alpha > 0$ and rate $\lambda > 0$,
    g(t) = \frac{\lambda^{\alpha}}{\Gamma(\alpha)} t^{\alpha - 1} e^{-\lambda t}, \quad t > 0.
  • When $\alpha = 1$, Gamma reduces to the Exponential distribution: Exp(λ)=Γ(1,λ).\text{Exp}(\lambda) = \Gamma(1, \lambda).
  • Interpretation: The waiting time until the $\alpha$-th event is Gamma with parameters $(\alpha, \lambda)$; the Gamma variable is the sum of $\alpha$ i.i.d. Exp($\lambda$) variables (only strictly true for integer $\alpha$).
  • Applications: modeling arrival times in Poisson processes, time to failures, positive-valued quantities (rainfalls), and financial quantities like insurance claim sizes.
  • Gamma function connection: The gamma function is defined as \Gamma(x) = \int_{0}^{\infty} u^{x-1} e^{-u} \, du, \quad x > 0.
    • It satisfies the recursive relation
      Γ(x)=(x1)Γ(x1).\Gamma(x) = (x-1) \Gamma(x-1).
    • For positive integers, Γ(n)=(n1)!.\Gamma(n) = (n-1)!.
  • Gamma density with two parameters: shape $\alpha > 0$ and rate $\lambda > 0$ is often written as
    g(t) = \frac{\lambda^{\alpha}}{\Gamma(\alpha)} t^{\alpha - 1} e^{-\lambda t}, \quad t > 0.
  • Mean and variance of Gamma$(\alpha, \lambda)$:
    • E[X]=αλ,Var(X)=αλ2.E[X] = \frac{\alpha}{\lambda}, \quad \operatorname{Var}(X) = \frac{\alpha}{\lambda^{2}}.
  • Important special cases and interpretations:
    • If $X \sim \Gamma(n, \lambda)$ with integer $n$, then $X$ is the sum of $n$ i.i.d. Exp($\lambda$) variables.
    • The Gamma distribution generalizes the exponential distribution; when $\alpha=1$ we recover Exp($\lambda$).
  • Example in seismology: times between microearthquakes may be modeled with Gamma vs Exponential; Gamma often provides better fit when inter-event times exhibit variability beyond Poisson assumptions.

Additional notes on Gamma and related concepts

  • If $X \sim \Gamma(\alpha, \lambda)$, then the case $\alpha = n$ (a positive integer) corresponds to the sum of $n$ independent Exp($\lambda$) random variables.
  • The gamma function appears in the normalization constant of the gamma density; it generalizes the factorial to non-integer values.
  • In practice, the gamma distribution is used for positive-valued data with skewness controlled by the shape parameter $\alpha$; larger $\alpha$ yields more symmetric (approximately normal) shapes when scaled appropriately.

Quick recap and connections to foundational principles

  • Linearity of expectation: expectations of sums are sums of expectations; transformations of random variables preserve linear relationships under expectation.
  • Integral representations and Fubini/Tonelli theorem allow swapping order of integration, enabling tail-integral representations.
  • Normal approximation to binomial relies on central limit intuition: binomial counts behave like normal distributions under appropriate scaling.
  • Exponential distribution embodies memoryless property, linking to Poisson processes and waiting-time problems.
  • Gamma distribution connects to Poisson processes as the waiting time to the n-th event; its properties extend the exponential case to sums of independent exponentials.

Key formulas to remember (LaTeX)

  • Expectation of a transformation:
    E[g(X)]=g(x)f(x)dx.E[g(X)] = \int_{-\infty}^{\infty} g(x) f(x) \, dx.
  • Linear transformation of X:
    E[aX+b]=aE[X]+b.E[aX + b] = a E[X] + b.
  • Tail-integral representation (for nonnegative g):
    E[g(X)] = \int_{0}^{\infty} \Pr{g(X) > y} \, dy.
  • Normal approximation to Binomial (DeMoivre–Laplace):
    lim<em>nPr(aS</em>nnpnp(1p)b)=Φ(b)Φ(a).\lim<em>{n \to \infty} \Pr\left(a \le \frac{S</em>n - np}{\sqrt{np(1-p)}} \le b\right) = \Phi(b) - \Phi(a).
  • Exponential distribution:
    f(x) = \lambda e^{-\lambda x} (x \ge 0), \quad F(x) = 1 - e^{-\lambda x} (x \ge 0), \quad \Pr(X > x) = e^{-\lambda x}.
  • Mean and variance of Exp($\lambda$):
    E[X]=1λ,Var(X)=1λ2.E[X] = \frac{1}{\lambda}, \quad \operatorname{Var}(X) = \frac{1}{\lambda^{2}}.
  • Memoryless property (Exponential):
    PrTt+sTs=PrTt=eλt.\Pr{T \ge t+s \mid T \ge s} = \Pr{T \ge t} = e^{-\lambda t}.
  • Gamma density (shape $\alpha$, rate $\lambda$):
    g(t) = \frac{\lambda^{\alpha}}{\Gamma(\alpha)} t^{\alpha - 1} e^{-\lambda t}, \quad t > 0.
  • Gamma function:
    \Gamma(x) = \int_{0}^{\infty} u^{x-1} e^{-u} \, du, \quad x > 0,
    Γ(x)=(x1)Γ(x1),Γ(n)=(n1)! (nN).\Gamma(x) = (x-1)\Gamma(x-1), \quad \Gamma(n) = (n-1)! \ (n \in \mathbb{N}).
  • Gamma moments: for $X \sim \Gamma(\alpha, \lambda)$,
    E[X]=αλ,Var(X)=αλ2.E[X] = \frac{\alpha}{\lambda}, \quad \operatorname{Var}(X) = \frac{\alpha}{\lambda^{2}}.