MATH1041 Lecture 6 – Statistical Inference: Hypothesis Testing & Central Limit Theorem

Confidence Intervals: Quick Revision

Goal: estimate population mean $\mu$ with a range rather than a single number.
Standard ( $\sigma$ known): $CI_{C}(\mu)=\Big[\bar X\;\pm\;z^\dfrac{\sigma}{\sqrt n}\Big]$ where P(-z^<Z<z^*)=C for $Z\sim N(0,1).$
Common 95 % choice: $z^*=1.96.$
Interpretation pitfalls (TRUE/FALSE slide):- The probability that the fixed true mean lies in a realised interval is NOT 95 % (parameter is not random).
- Narrower interval (same confidence) ⇒ more precision.
- Repeating study 20 times → \approx 1 interval misses $\mu.$

When $\sigma$ is Unknown ➜ t-distribution

Replace $\sigma$ with sample $s$ : $s=\sqrt{\dfrac{\sum(x_i-\bar x)^2}{n-1}}.$
Extra variability ⇒ heavier tails.
If $Xi\stackrel{iid}{\sim}N(\mu,\sigma^2)$ then $T=\dfrac{\bar X-\mu}{S/\sqrt n}\sim t{n-1}.$
95 % quantiles shrink toward 1.96 as df ↑ (e.g., $t{1}=\pm12.706,\ t{5}=\pm2.571,\ t_{10}=\pm2.228$ ).
Generic CI when $\sigma$ unknown: $CI{C}(\mu)=\Big[\bar X\;\pm\;t^\dfrac{S}{\sqrt n}\Big]$ where t^ cuts central $C$ of $t{n-1}.$

Hypothesis Testing: Assume the Opposite

Essence = probabilistic proof by contradiction.
Four canonical steps1. State alternative Ha and null H0 (counter-claim).
1. Choose test statistic, compute its value $z{obs}$ or $t{obs}$ and give null distribution.
2. Compute P-value = $P{H0}(\text{statistic as or more extreme than observed}).$
3. Draw conclusion in plain English + check assumptions.
Test statistic template: $\dfrac{\text{estimator}-\text{value under }H_0}{\text{sd or se of numerator}}.$

One- vs Two-sided alternatives

Ha: \mu>\mu0 ⇒ $P= P(Z\ge z{obs})$
Ha: \mu<\mu0 ⇒ $P= P(Z\le z{obs})$
Ha: $\mu\ne\mu0$ ⇒ $P=2P(Z\le -|z{obs}|).$

"Worst–case" boundary

For unilateral H0 (e.g., $\mu\ge\mu0$ ) the largest P-value occurs at boundary $\mu=\mu0.$
Hence null distribution taken at equality even if H0 expressed with ( $\ge,\le$ ).

Illustrative Proof-by-Contradiction Analogies

Pigeonhole/pizza slices.
Largest integer.
Unfair coin (100 tosses, 90 heads) illustrating binomial tail computation.

Worked Examples

• Body temperature (n=106, $\sigma=0.4$ )

H0: $\mu\ge37$ , Ha: \mu<37, $z{obs}=-5.66$ , $P=7.6\times10^{-9}$ → very strong evidence mean < 37 °C. • Milk freezing (n=5, $\sigma=0.008$ ) H0: $\mu=-0.545$ vs Ha: \mu>-0.545; $P=0.0252$ → probable adulteration. • Corn yield (n=15, $\sigma=10$ ) vs 110 bu/acre: $z{obs}=5.48$ , $P\approx2.1\times10^{-8}$ → higher yield.

• Lead in soil (n=27, $s=10$ ) H0: $\mu\ge86$ , Ha: \mu<86; $t_{obs}=-1.56$ , $P\approx0.07$ → weak evidence.

P-values: Interpretation & Misuse

Small ⇒ data unlikely under H0, hence evidence against H0 (not probability H0 false).
Large ⇒ insufficient evidence; never "prove" H0.
Pitfalls:
• dependence on chosen test statistic (heads vs alternations example).
• sample size inflation: tiny deviations ⇢ tiny P.
• multiple testing inflates false discoveries (coin 6-heads in 10 batches example).
• p-hacking & reproducibility crisis.
Do not label "statistically significant"; follow ATOM: Accept uncertainty, be Thoughtful, Open, Modest.

Significance Levels (legacy concept)

Pre-chosen threshold $\alpha$ (often 0.05) historically guided decisions.
Recognised as arbitrary; current advice: report exact P, effect size, CI, context.

Z-test vs t-test Summary

Z-test requirements: normal (or CLT), $\sigma$ known.
t-test: normal (or CLT), $\sigma$ unknown; null distribution $t_{n-1}.$
Large n: $t_{n-1} \approx N(0,1).$

Central Limit Theorem (CLT)

Conditions: $X1,\dots,Xn$ iid with mean $\mu$ , sd $\sigma$ ; n large.
Result: $\bar X \;\approx\; N\Big(\mu,\dfrac{\sigma}{\sqrt n}\Big).$
Consequently $\sum X_i \;\approx\; N(n\mu,\sqrt n\,\sigma).$
Accuracy ↑ with n; more skewed parent ⇒ bigger n needed.

Demonstrations

Uniform(0,1): averages of n=2,5,10,100 approach bell curve.
Highly skewed $\chi^2_1$ : n=2,5,10,100,200 averages become normal.
Family birthdays: individual day uniform-like; average over 2,4 dates becomes normal.

Discrete CLT Uses

Binomial: $X\sim B(n,p) \Rightarrow X \approx N(np,\sqrt{np(1-p)}).$
Poisson: large $\lambda$ ⇒ $X\sim Pois(\lambda) \approx N(\lambda,\sqrt\lambda).$

Assumption Checking & Robustness

Independence: ensured via random sampling (with replacement) or simple random sample from large pop.
Normality diagnostics: histogram, QQ-plot.
Rules of thumb for using t/CLT- n\le15: need (near) normal & no outliers.
- 15<n<40: moderate skew OK; no extreme outliers.
- n\ge40: t-procedures robust to strong skew; still vulnerable to gross outliers.
Remedies for non-normal small samples
• transform (log, $\sqrt x$ ) – iPod song duration example (log transformed data produced near-normal QQ).
• non-parametric tests (e.g. sign test).

Large-Sample Inference without Normality Assurance

If n large, can still use Z-type tests/CI by CLT even when distribution unknown.
R package ‘asympTest’ provides asymptotic Z tests.

Example: Calcium in Pregnant Women (n=180)

H0 $\mu=9.5$ , Ha $\mu\ne9.5$ , $t_{obs}=2.68$ ⇒ $P=0.008$ → strong evidence of difference.
95 % CI ( $\sigma$ unknown): $[9.52,9.64]\,\text{mg/dL}$ does not include 9.5.

Practical R Functions

z.test(x, mu, sigma.x, alternative, conf.level) from BSDA.
t.test(x, mu, alternative, conf.level).
asymp.test(x, parameter="mean", …) for large-sample Z.
confint(lm(y~1)) quickly produces one-sample t-CI.

P-hacking & Reproducibility

Definition: manipulating analysis to achieve small P.
Solutions: pre-registration, transparency, larger samples, focus on effect size, Bayesian alternatives.

Consulting & Further Help

If analysis beyond expertise: consult statisticians; at UNSW use STATS CENTRAL.

Keywords (Chapter 6)

Hypothesis testing, Null/Alternative hypothesis, Test statistic, Null distribution, P-value, One-/Two-sided, Z-test, t-test, Significance level, Central Limit Theorem, Proof by contradiction, Robustness, P-hacking.

Essential Take-Home Messages

Hypothesis testing mirrors logical contradiction: assume H0, seek strong evidence against it.
P-value is an evidence metric, not truth probability.
Z vs t hinges on knowledge of $\sigma$ ; with large n the distinction fades.
CLT underpins nearly all large-sample inference, making normal theory pervasive.
Always verify assumptions; transform or adopt non-parametric methods when violated.
Avoid binary “significant/not” language; report full context, uncertainties, and magnitude.

MATH1041 Lecture 6 – Statistical Inference: Hypothesis Testing & Central Limit Theorem

Confidence Intervals: Quick Revision

When σ\sigmaσ is Unknown ➜ t-distribution

Hypothesis Testing: Assume the Opposite

One- vs Two-sided alternatives

"Worst–case" boundary

Illustrative Proof-by-Contradiction Analogies

Worked Examples

P-values: Interpretation & Misuse

Significance Levels (legacy concept)

Z-test vs t-test Summary

Central Limit Theorem (CLT)

Demonstrations

Discrete CLT Uses

Assumption Checking & Robustness

Large-Sample Inference without Normality Assurance

Example: Calcium in Pregnant Women (n=180)

Practical R Functions

P-hacking & Reproducibility

Consulting & Further Help

Keywords (Chapter 6)

Essential Take-Home Messages

When $\sigma$ is Unknown ➜ t-distribution