Lecture 7 – Inference for a Population Proportion (Complete Study Notes)
l Lecture 1 – Sample Proportions
• Overarching course aim: introduce Statistics as the science of collecting, analysing & interpreting data.
• Last lecture recap: Central Limit Theorem (CLT), assumption checking, inference for a mean.
• This week’s roadmap
Sample proportions
Sampling distribution of counts & proportions
Confidence intervals for a proportion
Hypothesis tests for a proportion
Why move from means to proportions?
• Real-world variables are often binary (yes/no, success/failure) rather than quantitative & normal.
• CLT still provides an approximate normal distribution for the sample mean of Bernoulli trials when large.
• Statistical tools always rely on assumptions → must diagnose validity (e.g., normal quantile plot, sample size).
Motivating example – “Mike Promesses” poll
• Needs of votes to win.
• Poll: respondents, .
• Desired outputs:
95 % Confidence Interval (CI) for true population proportion .
Hypothesis test vs Ha:p>0.5.
• R quick look: prop.test(x = 108, n = 200, p = 0.5, alternative="greater") → , CI .
Binomial model (revision)
• Assumptions (7.1):
Fixed number of trials .
Two outcomes (success/failure).
Constant success probability .
Independent trials.
• PMF:
• Moments:
Recognising a binomial
• Whenever we “count how many ### in a random sample of fixed size”, the resulting RV is binomial.
• Examples: number of Australians favouring emission cuts, number of female students in SRS of size 4, etc.
Worked exercise – Australian Open Women final
• Assume . Sample → .
(verified by
dbinom)., full distribution table presented.
• Moments: .
Definition 7.2 – Sample proportion
Lecture 2 – Sampling Distribution of Counts & Proportions
Learning outcomes
• [Ch7-L2-O1] Know the sampling distribution of .
• [Ch7-L2-O2] Understand normal approximation.
Exact distribution of
• If then takes values with same probabilities
• Moments:
• Standard error (unknown ):
Example – Lowy emission-reduction poll
•
• Moments for :
• SE:
Normal approximation (Proposition 7.3)
• When large with and , then
• Continuity correction improves accuracy for discrete :
P(X<k)=P\bigl(Y<k-0.5\bigr)\text{ for }Y\sim N(np,np(1-p)).
Example – 50 underground tanks
• Assume .
P(X<10)=P\bigl(Z<\tfrac{10-12.5}{3.062}\bigr)\approx0.2072.
Exact binomial: 0.1637; with continuity correction 0.164 – almost identical.
CLT connection
• Bernoulli variables with
• Sample mean → CLT ⇒ normal approximation.
Rule-of-thumb (Assumptions 7.2)
• Use normal approximation if both and ≥ 10 (or with when unknown).
Data-science interlude – tidyverse
• Introduced pipe %>%, verbs: filter, mutate, arrange, select, relocate, summarise, group_by, join variants.
• Illustrated using palmerpenguins & tiny datasets (band_members, band_instruments).
Lecture 3 – Confidence Intervals for a Proportion
Learning outcomes
• [Ch7-L3-O1] Know CI formula (Wald).
• [Ch7-L3-O2] Check validity assumptions.
• [Ch7-L3-O3] Recognise common CI pattern.
Wald CI (Definition 7.4)
Given random SRS of size with and confidence level,
where is the quantile of (e.g., for 95 %).
Assumptions
Independent observations (SRS or sampling fraction ≤ of population).
Approximate normality:
Political poll worked example
• . 95 % CI:
• Interpretation: “We are 95 % confident true Labor support lies between 49.2 % and 52.8 %.”
Additional examples
• Birth-in-spring (108/400) → check before CI.
• Australian Open viewing (49/400) → CI \approx
Pattern recognition
• All CIs so far:
• Mean with known : . • Mean with unknown : .
• Proportion: formula above (Wald).
Cautions
• Wald CI can be poor for small ; alternatives: Wilson, Jeffreys, Agresti-Coull (implemented in binomCI).
• CI margin covers random sampling error only – non-response & bias not captured.
• Avoid over-precision; round sensibly (e.g., 54 % \pm 7 %).
Lecture 4 – Hypothesis Tests for a Proportion
Learning outcomes
• [Ch7-L4-O1] Perform Z-test for proportion.
• [Ch7-L4-O2] Check assumptions.
• [Ch7-L4-O3] Link tests \leftrightarrow confidence intervals.
• [Ch7-L4-O4] Recognise Wald test statistic pattern.
Z-test statistic (Definition 7.5)
To test vs alternative , use
If assumptions hold ⇒ approximately.
Why SE uses not
• Under we pretend (worst-case). Therefore variance uses .
Assumptions
Independence (SRS & population ≥10×sample).
Approx normality with null value:
Worked examples
Births in spring (108/400): - , one-sided Ha:p>0.25.
→ fail to reject; no evidence.
Voting intention change 2013 vs 2016: - , two-sided.
→ very strong evidence of change.
TV ratings (49/400) vs : - .
95 % CI excludes 0.0688 ⇒ same conclusion – MATH1041 students differ.
Relationship test \leftrightarrow CI (Proposition 7.4)
• For two-sided tests at level , rejecting ⇔ null value outside CI.
• One-sided tests correspond to one-sided CIs.
Wald statistic pattern
Applies to Z-tests (mean or proportion) & t-tests (when unknown).
Continuity Correction Summary
• Used when approximating discrete binomial by continuous normal.
• Rule: replace integer cutoff with depending on inequality.
• Greatly improves tail probability accuracy (demonstrated with tank example).
Keywords & Take-away Concepts
• Proportion, sample proportion .
• Binomial distribution & parameters .
• Sampling distribution & CLT.
• Normal approximation criteria .
• Standard error .
• Wald confidence interval.
• Z-test for a single proportion.
• Continuity correction.
• Tidyverse data-wrangling verbs (filter, mutate, …).
Practical R Commands Cheat-Sheet
• Compute exact binomial prob.: dbinom(x,n,p); cumulative pbinom(k,n,p).
• Normal approx: pnorm(value, mean, sd) or qnorm(prob) for quantiles.
• prop.test(x,n,p0, alternative="two.sided", conf.level=0.95, correct=TRUE) – gives CI & test (with continuaty-corr ).
• binom.test(x,n,p=p0) – exact binomial test & exact CI.
• Standard CI by hand:
se <- sqrt(p_obs*(1-p_obs)/n)
ci <- p_obs + c(-1,1)*z_star*se
Common Pitfalls & Best Practices
• Do NOT apply normal approximation when or < 10. Use exact binomial methods instead.
• Interpret CIs in context; avoid overstating precision.
• Report assumptions explicitly (independence, sampling design, sample size adequacy).
• Remember CI ≠ probability interval for parameter; 95 % refers to long-run coverage.
• Wald CI unreliable for extreme or small → prefer Wilson/Agresti-Coull.
Connections to Further Topics
• Two-sample comparisons of proportions will extend these ideas (not covered here).
• tests of independence generalise to multi-category tables.
• Generalised linear models (logistic regression) model proportions with covariates.
Self-Reflection Prompts (Exercise 7.2)
• Which part of the inference workflow (formulating /, checking assumptions, computing ) still feels unclear?
• Identify a binary variable in your discipline and outline how you would design a study, compute , draw a CI, and test a claim.