Sampling Distribution and Inference - Quick Notes

Sampling distribution basics

  • Sampling distribution: distribution of a statistic across all possible samples from a population.
  • Key idea: use the sampling distribution to make inferences about a population parameter (mu, p, etc.) from a sample statistic.
  • Population vs. sample concepts:
    • Population measures are parameters (e.g., population mean μ, true proportion p).
    • Sample measures are statistics (e.g., sample mean X̄, sample proportion p̂).

Central Limit Theorem (CLT)

  • If a random sample is drawn from any population, the sampling distribution of the sample mean X̄ is approximately normal if the sample size is large enough (commonly n ≥ 30).
  • If the population is already normal, smaller samples can still yield a normal X̄.
  • CLT enables inference using the standard normal distribution after standardization.

Sampling distribution of the sample mean (X̄)

  • Mean of the sampling distribution: E[Xˉ]=μ.\mathbb{E}[\bar{X}] = \mu.
  • Variance of the sampling distribution: Var(Xˉ)=σ2n.\operatorname{Var}(\bar{X}) = \frac{\sigma^2}{n}.
  • Standard error of the mean: SE(Xˉ)=σn.\text{SE}(\bar{X}) = \frac{\sigma}{\sqrt{n}}.
  • Standardization to Z: Z=Xˉμσ/n.Z = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}}.
  • Finite population correction (when population size N is finite):
    σXˉ=σNnN11n=σnNnN1.\sigma_{\bar{X}} = \sigma \sqrt{\frac{N-n}{N-1}} \cdot \frac{1}{\sqrt{n}} = \frac{\sigma}{\sqrt{n}} \sqrt{\frac{N-n}{N-1}}.
    Note: often ignored when N is large relative to n.

Inference with X̄

  • Because X̄ ≈ Normal(μ, σ^2/n) for large n, we can make confidence statements about μ using the standard normal distribution.
  • Increasing n reduces the SE and tightens the distribution around μ.

Sampling distribution of the sample proportion p̂

  • For Bernoulli trials with true population proportion p, let X be the number of successes in n trials: X ~ Bin(n, p).
  • Sample proportion: p^=Xn.\hat{p} = \frac{X}{n}.
  • Mean and variance of p̂:
    E[p^]=p,Var(p^)=p(1p)n.\mathbb{E}[\hat{p}] = p, \quad \operatorname{Var}(\hat{p}) = \frac{p(1-p)}{n}.
  • Standardization: Z=p^pp(1p)/n.Z = \frac{\hat{p} - p}{\sqrt{p(1-p)/n}}.
  • Normal approximation to p̂ is valid if the rule of thumb is met: np10andn(1p)10.np \ge 10 \quad\text{and}\quad n(1-p) \ge 10.
  • For X (number of successes), X ~ Bin(n, p) with mean E[X]=np\mathbb{E}[X] = np and variance Var(X)=np(1p).\operatorname{Var}(X) = np(1-p).

Bernoulli trials, independence, and sampling with replacement

  • Bernoulli trial: one trial with two outcomes (success/failure).
  • Trials must be independent and identically distributed; use sampling with replacement to preserve constant p across trials.

Examples (illustrative calculations)

  • Chocolate weights example
    • Given single bar: X ~ Normal(μ=32.2, σ=0.3).
    • Q1: P(X > 32) = ?
    • Z = (32 - 32.2)/0.3 = -0.667 ➜ P(X > 32) ≈ 0.75.
    • Q2: For a pack of n = 4, X̄ ~ Normal(μ=32.2, σ^2/n = 0.3^2/4 = 0.0225).
    • SE = 0.3/√4 = 0.15; Z = (32 - 32.2)/0.15 = -1.333 ➜ P(X̄ > 32) ≈ 0.91.
    • intuition: larger samples pull the average toward the true mean and reduce variability.
  • Weekly income example
    • Population: μ = 600, σ = 100, n = 25.
    • P(X̄ < 550) = P(Z < (550-600)/(100/√25)) = P(Z < -2.5) ≈ 0.0062 (0.62%).
  • Proportions example (Laurier brand)
    • Population proportion: p = 0.30, n = 1000.
    • Q: P(p̂ > 0.32) = ?
    • Z = (0.32 - 0.30)/√(p(1-p)/n) = 0.02 / √(0.21/1000) ≈ 1.38
    • P(p̂ > 0.32) ≈ 1 - Φ(1.38) ≈ 0.08 (8%).

Finite population correction (recap)

  • When sampling from a finite population, the standard error is adjusted by: SE=σnNnN1.\text{SE} = \frac{\sigma}{\sqrt{n}} \sqrt{\frac{N-n}{N-1}}.
  • For very large N, the correction factor ~ 1 and is often ignored.

Quick takeaways

  • The sampling distribution describes how a statistic would behave over many samples, enabling inference about population parameters.
  • X̄ has mean μ and variance σ^2/n; SE shrinks as n grows; CLT guarantees approximate normality for large n.
  • p̂ has mean p and variance p(1-p)/n; normal approximation requires np ≥ 10 and n(1-p) ≥ 10.
  • Standardize to Z to use the standard normal table for probabilities.
  • Larger samples reduce variability and improve accuracy for population parameters; hypotheses and confidence statements rely on these distributions.
  • Always consider whether finite population correction is needed when N is not much larger than n.

Quick formulas to memorize

  • For X̄: E[Xˉ]=μ,Var(Xˉ)=σ2n,SE(Xˉ)=σn.\mathbb{E}[\bar{X}] = \mu,\quad \operatorname{Var}(\bar{X}) = \frac{\sigma^2}{n},\quad \text{SE}(\bar{X}) = \frac{\sigma}{\sqrt{n}}.
  • Standardize X̄: Z=Xˉμσ/n.Z = \frac{\bar{X}-\mu}{\sigma/\sqrt{n}}.
  • For p̂: E[p^]=p,Var(p^)=p(1p)n,Z=p^pp(1p)/n.\mathbb{E}[\hat{p}] = p,\quad \operatorname{Var}(\hat{p}) = \frac{p(1-p)}{n},\quad Z = \frac{\hat{p}-p}{\sqrt{p(1-p)/n}}.
  • Normal approximation conditions for p̂: np10,  n(1p)10.np \ge 10,\; n(1-p) \ge 10.
  • Finite population correction (FPC): SEXˉ=σnNnN1.\text{SE}_{\bar{X}} = \frac{\sigma}{\sqrt{n}} \sqrt{\frac{N-n}{N-1}}.