MATH1041 Lecture 6 – Statistical Inference: Hypothesis Testing & Central Limit Theorem
Confidence Intervals: Quick Revision
Goal: estimate population mean with a range rather than a single number.
Standard ( known): where P(-z^<Z<z^*)=C for
Common 95 % choice:
Interpretation pitfalls (TRUE/FALSE slide):- The probability that the fixed true mean lies in a realised interval is NOT 95 % (parameter is not random).
Narrower interval (same confidence) ⇒ more precision.
Repeating study 20 times → \approx 1 interval misses
When is Unknown ➜ t-distribution
Replace with sample :
Extra variability ⇒ heavier tails.
If then
95 % quantiles shrink toward 1.96 as df ↑ (e.g., ).
Generic CI when unknown: where t^ cuts central of
Hypothesis Testing: Assume the Opposite
Essence = probabilistic proof by contradiction.
Four canonical steps1. State alternative Ha and null H0 (counter-claim).
Choose test statistic, compute its value or and give null distribution.
Compute P-value =
Draw conclusion in plain English + check assumptions.
Test statistic template:
One- vs Two-sided alternatives
Ha: \mu>\mu0 ⇒
Ha: \mu<\mu0 ⇒
Ha: ⇒
"Worst–case" boundary
For unilateral H0 (e.g., ) the largest P-value occurs at boundary
Hence null distribution taken at equality even if H0 expressed with ().
Illustrative Proof-by-Contradiction Analogies
Pigeonhole/pizza slices.
Largest integer.
Unfair coin (100 tosses, 90 heads) illustrating binomial tail computation.
Worked Examples
• Body temperature (n=106, )
H0: , Ha: \mu<37, , → very strong evidence mean < 37 °C. • Milk freezing (n=5, ) H0: vs Ha: \mu>-0.545; → probable adulteration. • Corn yield (n=15, ) vs 110 bu/acre: , → higher yield.
• Lead in soil (n=27, ) H0: , Ha: \mu<86; , → weak evidence.
P-values: Interpretation & Misuse
Small ⇒ data unlikely under H0, hence evidence against H0 (not probability H0 false).
Large ⇒ insufficient evidence; never "prove" H0.
Pitfalls:
• dependence on chosen test statistic (heads vs alternations example).
• sample size inflation: tiny deviations ⇢ tiny P.
• multiple testing inflates false discoveries (coin 6-heads in 10 batches example).
• p-hacking & reproducibility crisis.
Do not label "statistically significant"; follow ATOM: Accept uncertainty, be Thoughtful, Open, Modest.
Significance Levels (legacy concept)
Pre-chosen threshold (often 0.05) historically guided decisions.
Recognised as arbitrary; current advice: report exact P, effect size, CI, context.
Z-test vs t-test Summary
Z-test requirements: normal (or CLT), known.
t-test: normal (or CLT), unknown; null distribution
Large n:
Central Limit Theorem (CLT)
Conditions: iid with mean , sd ; n large.
Result:
Consequently
Accuracy ↑ with n; more skewed parent ⇒ bigger n needed.
Demonstrations
Uniform(0,1): averages of n=2,5,10,100 approach bell curve.
Highly skewed : n=2,5,10,100,200 averages become normal.
Family birthdays: individual day uniform-like; average over 2,4 dates becomes normal.
Discrete CLT Uses
Binomial:
Poisson: large ⇒
Assumption Checking & Robustness
Independence: ensured via random sampling (with replacement) or simple random sample from large pop.
Normality diagnostics: histogram, QQ-plot.
Rules of thumb for using t/CLT- n\le15: need (near) normal & no outliers.
15<n<40: moderate skew OK; no extreme outliers.
n\ge40: t-procedures robust to strong skew; still vulnerable to gross outliers.
Remedies for non-normal small samples
• transform (log, ) – iPod song duration example (log transformed data produced near-normal QQ).
• non-parametric tests (e.g. sign test).
Large-Sample Inference without Normality Assurance
If n large, can still use Z-type tests/CI by CLT even when distribution unknown.
R package ‘asympTest’ provides asymptotic Z tests.
Example: Calcium in Pregnant Women (n=180)
H0 , Ha , ⇒ → strong evidence of difference.
95 % CI ( unknown): does not include 9.5.
Practical R Functions
z.test(x, mu, sigma.x, alternative, conf.level) from BSDA.
t.test(x, mu, alternative, conf.level).
asymp.test(x, parameter="mean", …) for large-sample Z.
confint(lm(y~1)) quickly produces one-sample t-CI.
P-hacking & Reproducibility
Definition: manipulating analysis to achieve small P.
Solutions: pre-registration, transparency, larger samples, focus on effect size, Bayesian alternatives.
Consulting & Further Help
If analysis beyond expertise: consult statisticians; at UNSW use STATS CENTRAL.
Keywords (Chapter 6)
Hypothesis testing, Null/Alternative hypothesis, Test statistic, Null distribution, P-value, One-/Two-sided, Z-test, t-test, Significance level, Central Limit Theorem, Proof by contradiction, Robustness, P-hacking.
Essential Take-Home Messages
Hypothesis testing mirrors logical contradiction: assume H0, seek strong evidence against it.
P-value is an evidence metric, not truth probability.
Z vs t hinges on knowledge of ; with large n the distinction fades.
CLT underpins nearly all large-sample inference, making normal theory pervasive.
Always verify assumptions; transform or adopt non-parametric methods when violated.
Avoid binary “significant/not” language; report full context, uncertainties, and magnitude.