Sampling Distributions & Central Limit Theorem – Comprehensive Study Notes

Sampling Distributions: Core Ideas

  • A sampling distribution is the probability distribution of a statistic (e.g., a sample mean (\bar{x}), proportion (\hat{p}), variance (s^2)) constructed from all possible samples of a given size drawn from a population.
  • It is not the distribution of the raw data themselves; instead it tells us how the statistic behaves from sample to sample.
  • Key practical purpose: allows us to quantify sampling variability, build confidence intervals, and conduct hypothesis tests.

Fundamental Notation & Definitions

  • Population mean: μ\mu
  • Population standard deviation: σ\sigma
  • Sample size: nn
  • Sample mean: xˉ=1n<em>i=1nX</em>i\bar{x} = \frac{1}{n}\sum<em>{i=1}^{n}X</em>i
  • Standard Error (SE) of the mean:
    SE=σxˉ=σnSE = \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}
  • Relationship of sampling-distribution parameters to population parameters:
    • Mean of xˉ\bar{x}: μ<em>xˉ=μ\mu<em>{\bar{x}} = \mu • Std. dev. of xˉ\bar{x}: σ</em>xˉ=σn\sigma</em>{\bar{x}} = \frac{\sigma}{\sqrt{n}}

Distribution of a Single Fair Die

  • Experiment: roll one fair six-sided die infinitely many times.
  • Random variable X=\text{# spots}; probability mass function (pmf):
    P(X=x)=16,x=1,2,3,4,5,6P(X=x) = \frac{1}{6}, \quad x = 1,2,3,4,5,6
  • Population (per-roll) mean & variance:
    μ=3.5,σ2=35122.9167\mu = 3.5, \quad \sigma^2 = \frac{35}{12} \approx 2.9167 (not explicitly in slides but useful)

Mean of Two Dice (Sample Size n = 2)

  • Possible ordered pairs (36) are all equally likely; they form samples of size 2.
  • The statistic of interest: xˉ=X<em>1+X</em>22\bar{x}=\frac{X<em>1+X</em>2}{2}.
  • Although 36 distinct pairs exist, (\bar{x}) can only take 11 distinct values: 1,1.5,2,,61, 1.5, 2, \ldots, 6.
  • Frequencies of each (\bar{x}):
    • E.g., (\bar{x}=3.5) occurs most often; (\bar{x}=1) or (6) least often.
  • Sampling‐distribution pmf shown visually in slides (bars labelled 6/36, 5/36, … ,1/36).
  • Demonstrates concentration toward the center as sample size grows: extreme averages become rarer than extreme individual values.

Comparing Population & Sampling Distributions

  • For the die example (or any i.i.d. setting):
    μ<em>xˉ=μ\mu<em>{\bar{x}} = \mu → the sampling distribution is centered at the true population mean. • σ</em>xˉ=σn\sigma</em>{\bar{x}} = \frac{\sigma}{n} (slide typo omits square-root; correct formula is σ/n\sigma/\sqrt{n}).
    • Terminology: sampling-distribution standard deviation = Standard Error (SE).
  • Practical reading: larger n → smaller SE → tighter clustering of (\bar{x}) around (\mu).

Central Limit Theorem (CLT)

  • Statement (informal): For a random sample of size nn from any population with mean μ\mu and finite variance σ2\sigma^2, the distribution of Xˉ\bar{X} approaches Normal N(μ,σ2/n)\mathcal N(\mu, \sigma^2/n) as nn becomes large.
  • If the underlying population is already Normal, then Xˉ\bar{X} is exactly Normal for every nn.
  • If the population is nonnormal (e.g., skewed salaries), a larger nn is needed before the Normal approximation is adequate.
  • CLT justifies z-procedures (confidence intervals, tests) for many practical problems.

Worked Example 1 – Soda Bottling: Single Bottle

Scenario

  • Fill amounts are Normal: XN(μ=32.2oz,  σ=0.3oz)X \sim \mathcal N(\mu = 32.2\,\text{oz},\; \sigma = 0.3\,\text{oz}).
  • Question: P(X>32).
    Computation
  • Standardize: Z=Xμσ=3232.20.3=0.67Z = \frac{X-\mu}{\sigma} = \frac{32-32.2}{0.3} = -0.67.
  • Use Normal table / software: P(Z>-0.67) = 0.7486.
    Interpretation
  • ≈ 75 % chance any single bottle contains more than the advertised 32 oz.

Worked Example 2 – Soda Bottling: Carton of Four Bottles

Setup

  • Sample size n=4n=4, so (\bar{X}) is Normal with:
    μXˉ=32.2\mu_{\bar{X}} = 32.2 oz
    SE=σ/n=0.3/4=0.15SE = \sigma/\sqrt{n} = 0.3/\sqrt{4} = 0.15 oz
  • Question: P(\bar{X}>32).
    Calculation
  • Z=3232.20.15=1.33Z = \frac{32-32.2}{0.15} = -1.33.
  • P(Z>-1.33) = 0.9082.
    Interpretation
  • ≈ 91 % probability the average of 4 bottles exceeds 32 oz – higher than single-bottle case, illustrating SE shrinkage.

Graphical Insights (Slides 13–15)

  • Slide shows two Normal curves sharing mean 32.2 oz but different spreads:
    Wider curve: individual bottle distribution ((\sigma = 0.3)).
    Narrower curve: sampling distribution for n=4n=4 ((SE = 0.15)).
  • Shaded right-tail areas depict the two probabilities computed above (≈75 % vs 91 %).
  • Visual takeaway: averaging smooths variability, pushing probability mass closer to the mean and increasing the likelihood of values near (\mu).

Worked Example 3 – Graduate Salary Claim

Problem Statement (slide 16)

  • Dean claims population mean salary μ=$800\mu=\$800 per week, (\sigma=\$100).
  • Student samples n=25n=25 recent grads; finds xˉ=$750\bar{x}=\$750.
    Goal
  • Assess likelihood of observing (\bar{x}\le 750) under dean’s claim.
    Assumptions
  • Raw salaries skewed right, but n=25n=25 ⇒ CLT → Xˉ\bar{X} approximately Normal.
    Calculations
  • SE=σ/n=100/5=20SE = \sigma/\sqrt{n} = 100/5 = 20.
  • Z=75080020=2.5Z = \frac{750-800}{20} = -2.5.
  • P(Xˉ750)=P(Z2.5)=0.0062P(\bar{X}\le 750) = P(Z\le -2.5) = 0.0062.
    Interpretation (slide 19)
  • Probability ≈ 0.62 %. Such a low likelihood implies the observed sample mean is highly inconsistent with μ=800\mu=800.
  • Conclusion: dean’s claim is not justified at commonly used significance levels (e.g., 5 %).
    Ethical dimension
  • Misrepresentation of program outcomes could mislead prospective students; statistical verification promotes accountability.

95 % Confidence Interval for the Mean (Excel Example)

  • Objective: find range of mean salaries we would expect if μ=800\mu=800.
  • Excel’s function: =CONFIDENCE(alpha, standard_dev, size) computes the half-width EE of a two-sided z-interval.
    • (\alpha = 0.05) (for 95 % confidence)
    • (\sigma = 100)
    • (n = 25)
  • Result: E=41.27E = 41.27.
  • 95 % CI for (\mu): 800±41.27    (758.73,841.27)800 \pm 41.27 \;\Rightarrow\; (758.73, 841.27).
  • Observed (\bar{x}=750) lies outside this interval ⇒ further evidence against the claim.

Practical Connections & Implications

  • Quality control (bottling): monitoring (\bar{X}) over groups of items gives quicker detection of mean shifts than single-unit checks.
  • Business analytics & claims testing: sampling distributions underpin formal hypothesis tests and CI’s that protect stakeholders from unfounded assertions.
  • Decision thresholds: choosing sample size affects SE; bigger samples yield tighter inferences but cost more.
  • Recognize skewness & robustness: CLT helps but for heavy-tailed or highly skewed data, larger nornonparametricmethodsmightbeprudent.</li></ul><h3id="keytakeaways">KeyTakeaways</h3><ul><li>Samplingdistributionsdescribehowastatisticfluctuatesacrossrepeatedsamples.</li><li>Meanofsamplingdistributionequalstruepopulationmean;itsspread(SE)diminisheslikeor non-parametric methods might be prudent.</li> </ul> <h3 id="keytakeaways">Key Takeaways</h3> <ul> <li>Sampling distributions describe how a statistic fluctuates across repeated samples.</li> <li>Mean of sampling distribution equals true population mean; its spread (SE) diminishes like1/\sqrt{n}$$.
  • CLT is a cornerstone: regardless of population shape, sufficiently large samples make (\bar{X}) nearly Normal.
  • Real-world examples (soda fill, salaries) illustrate computations of tail probabilities, hypothesis evaluation, and confidence intervals.
  • Misinterpretation risk: always distinguish between population variability ((\sigma)) and sampling variability (SE).
  • Statistical findings have ethical and practical consequences—claims must be evidence-based.