MATH1041 Lecture 6 – Statistical Inference: Hypothesis Testing & Central Limit Theorem

Confidence Intervals: Quick Revision

  • Goal: estimate population mean μ\mu with a range rather than a single number.

  • Standard (σ\sigma known): CIC(μ)=[Xˉ  ±  z<em>σn]CI_{C}(\mu)=\Big[\bar X\;\pm\;z^<em>\dfrac{\sigma}{\sqrt n}\Big] where P(-z^<Z<z^*)=C for ZN(0,1).Z\sim N(0,1).

  • Common 95 % choice: z=1.96.z^*=1.96.

  • Interpretation pitfalls (TRUE/FALSE slide):- The probability that the fixed true mean lies in a realised interval is NOT 95 % (parameter is not random).

    • Narrower interval (same confidence) ⇒ more precision.

    • Repeating study 20 times → \approx 1 interval misses μ.\mu.

When σ\sigma is Unknown ➜ t-distribution

  • Replace σ\sigma with sample ss: s=(xixˉ)2n1.s=\sqrt{\dfrac{\sum(x_i-\bar x)^2}{n-1}}.

  • Extra variability ⇒ heavier tails.

  • If X<em>iiidN(μ,σ2)X<em>i\stackrel{iid}{\sim}N(\mu,\sigma^2) then T=XˉμS/nt</em>n1.T=\dfrac{\bar X-\mu}{S/\sqrt n}\sim t</em>{n-1}.

  • 95 % quantiles shrink toward 1.96 as df ↑ (e.g., t<em>1=±12.706, t</em>5=±2.571, t10=±2.228t<em>{1}=\pm12.706,\ t</em>{5}=\pm2.571,\ t_{10}=\pm2.228).

  • Generic CI when σ\sigma unknown: CI<em>C(μ)=[Xˉ  ±  tSn]CI<em>{C}(\mu)=\Big[\bar X\;\pm\;t^\dfrac{S}{\sqrt n}\Big] where t^ cuts central CC of t</em>n1.t</em>{n-1}.

Hypothesis Testing: Assume the Opposite

  • Essence = probabilistic proof by contradiction.

  • Four canonical steps1. State alternative Ha and null H0 (counter-claim).

    1. Choose test statistic, compute its value z<em>obsz<em>{obs} or t</em>obst</em>{obs} and give null distribution.

    2. Compute P-value = P<em>H</em>0(statistic as or more extreme than observed).P<em>{H</em>0}(\text{statistic as or more extreme than observed}).

    3. Draw conclusion in plain English + check assumptions.

  • Test statistic template: estimatorvalue under H0sd or se of numerator.\dfrac{\text{estimator}-\text{value under }H_0}{\text{sd or se of numerator}}.

One- vs Two-sided alternatives
  • Ha: \mu>\mu0 ⇒ P=P(Zz</em>obs)P= P(Z\ge z</em>{obs})

  • Ha: \mu<\mu0 ⇒ P=P(Zz</em>obs)P= P(Z\le z</em>{obs})

  • Ha: μμ<em>0\mu\ne\mu<em>0P=2P(Zz</em>obs).P=2P(Z\le -|z</em>{obs}|).

"Worst–case" boundary
  • For unilateral H0 (e.g., μμ<em>0\mu\ge\mu<em>0) the largest P-value occurs at boundary μ=μ</em>0.\mu=\mu</em>0.

  • Hence null distribution taken at equality even if H0 expressed with (,\ge,\le).

Illustrative Proof-by-Contradiction Analogies

  • Pigeonhole/pizza slices.

  • Largest integer.

  • Unfair coin (100 tosses, 90 heads) illustrating binomial tail computation.

Worked Examples

• Body temperature (n=106, σ=0.4\sigma=0.4)

  • H0: μ37\mu\ge37, Ha: \mu<37, z<em>obs=5.66z<em>{obs}=-5.66, P=7.6×109P=7.6\times10^{-9} → very strong evidence mean < 37 °C. • Milk freezing (n=5, σ=0.008\sigma=0.008) H0: μ=0.545\mu=-0.545 vs Ha: \mu>-0.545; P=0.0252P=0.0252 → probable adulteration. • Corn yield (n=15, σ=10\sigma=10) vs 110 bu/acre: z</em>obs=5.48z</em>{obs}=5.48, P2.1×108P\approx2.1\times10^{-8} → higher yield.

• Lead in soil (n=27, s=10s=10) H0: μ86\mu\ge86, Ha: \mu<86; tobs=1.56t_{obs}=-1.56, P0.07P\approx0.07 → weak evidence.

P-values: Interpretation & Misuse

  • Small ⇒ data unlikely under H0, hence evidence against H0 (not probability H0 false).

  • Large ⇒ insufficient evidence; never "prove" H0.

  • Pitfalls:

    • dependence on chosen test statistic (heads vs alternations example).

    • sample size inflation: tiny deviations ⇢ tiny P.

    • multiple testing inflates false discoveries (coin 6-heads in 10 batches example).

    • p-hacking & reproducibility crisis.

  • Do not label "statistically significant"; follow ATOM: Accept uncertainty, be Thoughtful, Open, Modest.

Significance Levels (legacy concept)

  • Pre-chosen threshold α\alpha (often 0.05) historically guided decisions.

  • Recognised as arbitrary; current advice: report exact P, effect size, CI, context.

Z-test vs t-test Summary

  • Z-test requirements: normal (or CLT), σ\sigma known.

  • t-test: normal (or CLT), σ\sigma unknown; null distribution tn1.t_{n-1}.

  • Large n: tn1N(0,1).t_{n-1} \approx N(0,1).

Central Limit Theorem (CLT)

  • Conditions: X<em>1,,X</em>nX<em>1,\dots,X</em>n iid with mean μ\mu, sd σ\sigma; n large.

  • Result: Xˉ    N(μ,σn).\bar X \;\approx\; N\Big(\mu,\dfrac{\sigma}{\sqrt n}\Big).

  • Consequently Xi    N(nμ,nσ).\sum X_i \;\approx\; N(n\mu,\sqrt n\,\sigma).

  • Accuracy ↑ with n; more skewed parent ⇒ bigger n needed.

Demonstrations
  • Uniform(0,1): averages of n=2,5,10,100 approach bell curve.

  • Highly skewed χ12\chi^2_1: n=2,5,10,100,200 averages become normal.

  • Family birthdays: individual day uniform-like; average over 2,4 dates becomes normal.

Discrete CLT Uses
  • Binomial: XB(n,p)XN(np,np(1p)).X\sim B(n,p) \Rightarrow X \approx N(np,\sqrt{np(1-p)}).

  • Poisson: large λ\lambdaXPois(λ)N(λ,λ).X\sim Pois(\lambda) \approx N(\lambda,\sqrt\lambda).

Assumption Checking & Robustness

  • Independence: ensured via random sampling (with replacement) or simple random sample from large pop.

  • Normality diagnostics: histogram, QQ-plot.

  • Rules of thumb for using t/CLT- n\le15: need (near) normal & no outliers.

    • 15<n<40: moderate skew OK; no extreme outliers.

    • n\ge40: t-procedures robust to strong skew; still vulnerable to gross outliers.

  • Remedies for non-normal small samples

    • transform (log, x\sqrt x) – iPod song duration example (log transformed data produced near-normal QQ).

    • non-parametric tests (e.g. sign test).

Large-Sample Inference without Normality Assurance

  • If n large, can still use Z-type tests/CI by CLT even when distribution unknown.

  • R package ‘asympTest’ provides asymptotic Z tests.

Example: Calcium in Pregnant Women (n=180)

  • H0 μ=9.5\mu=9.5, Ha μ9.5\mu\ne9.5, tobs=2.68t_{obs}=2.68P=0.008P=0.008 → strong evidence of difference.

  • 95 % CI (σ\sigma unknown): [9.52,9.64]mg/dL[9.52,9.64]\,\text{mg/dL} does not include 9.5.

Practical R Functions

  • z.test(x, mu, sigma.x, alternative, conf.level) from BSDA.

  • t.test(x, mu, alternative, conf.level).

  • asymp.test(x, parameter="mean", …) for large-sample Z.

  • confint(lm(y~1)) quickly produces one-sample t-CI.

P-hacking & Reproducibility

  • Definition: manipulating analysis to achieve small P.

  • Solutions: pre-registration, transparency, larger samples, focus on effect size, Bayesian alternatives.

Consulting & Further Help

  • If analysis beyond expertise: consult statisticians; at UNSW use STATS CENTRAL.

Keywords (Chapter 6)

  • Hypothesis testing, Null/Alternative hypothesis, Test statistic, Null distribution, P-value, One-/Two-sided, Z-test, t-test, Significance level, Central Limit Theorem, Proof by contradiction, Robustness, P-hacking.

Essential Take-Home Messages

  • Hypothesis testing mirrors logical contradiction: assume H0, seek strong evidence against it.

  • P-value is an evidence metric, not truth probability.

  • Z vs t hinges on knowledge of σ\sigma; with large n the distinction fades.

  • CLT underpins nearly all large-sample inference, making normal theory pervasive.

  • Always verify assumptions; transform or adopt non-parametric methods when violated.

  • Avoid binary “significant/not” language; report full context, uncertainties, and magnitude.