Theory-Based Confidence Intervals for One Population Proportion – PSTAT 5LS Lecture Notes

Administrative Announcements

Course: PSTAT 5LS – Statistical Literacy (Summer Session)
Current Topic: Slide Set 5 – Theory-Based Inference for a Population Proportion, p
- Today’s coverage begins at slide 31
- Next lecture: Slide Set 6 – Inference for One Mean
Deadlines
• HW 3 due Mon Jul 14 @ 11:59 PM
• HW 4 due Fri Jul 18 @ 11:59 PM
• Exam 1 during lecture on Wed Jul 16
– Covers Slide Sets 1–5 & HW 1–3
Exam 1 logistics
• Format: 15–18 MC + 2–4 free-response
• Bring: writing tool(s), calculator, photo ID
• Formula sheet supplied; you may write on the exam
• Tips: read carefully, include context/units, show work for FR, review answers
Office Hours
• Extra OH at noon, Mon (Zoom) this week
• Encouragement: “Visit us in office hours!”

Hypothesis testing ⇢ evaluates evidence about population parameters (e.g., H0:p=p0)
Estimation ⇢ uses sample statistics to approximate unknown parameters
Point estimate = single best guess for a parameter
• One-proportion case: \hat{p} estimates p
Natural sampling variability ⇒ \hat{p} rarely equals p exactly
• We therefore add a “wiggle room” around the point estimate
• This wiggle room = Margin of Error (MOE)

A Confidence Interval (CI) gives a range of plausible values for the parameter
• Generic structure: \text{point estimate} \;\pm\; \text{margin of error}
Margin of Error formula
• \text{MOE}=\text{multiplier}\times\text{standard error}
• Multiplier = critical value from a probability distribution (typically z for large-sample proportion CI)
• Standard Error (SE) estimates the SD of sampling distribution

Standard Error
• Meaning: typical variability of \hat{p} from sample to sample
• Formula (unknown p): SE=\sqrt{\dfrac{p(1-p)}{n}}
• Formula (practical—substitute \hat{p}): SE=\sqrt{\dfrac{\hat p(1-\hat p)}{n}}
Margin of Error
• Adjusts SE by desired confidence level
• MOE = z^* \times SE

Because p is unknown, replace it with \hat{p} in the SE
Final form (large-sample, theory-based CI):
\hat{p} \;\pm\; z^{*}\;\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}
Properties
• Higher confidence ⇒ larger z^ ⇒ wider interval • Lower confidence ⇒ smaller z^ ⇒ narrower interval

Confidence level refers to the long-run success rate of the method—not probability that a specific interval contains p
• e.g., 95 % CI: If we repeated the study many times, ≈95 % of resulting intervals would include the true p
A single computed CI either does (100 % fact) or does not (0 %) contain p—we just don’t know which

Independence (or random sampling / random assignment)
• Practically: simple random sample, or population ≥10× sample size for sampling w/o replacement
Success–Failure condition
• At least 10 successes & 10 failures in the sample
• Check using n\hat{p} \ge 10 and n(1-\hat{p}) \ge 10

Check conditions (Independence & Success–Failure)
Calculate CI using \hat{p} \pm z^*SE
Interpret in context (include parameter in words, level of confidence, and numeric bounds)

Simulated plot: many sample proportions (black dots) with their individual CIs (blue lines); true p is a red line
• Some intervals miss the true line; most capture it according to confidence level
Comparison of widths
• 90 % CI ⇒ narrowest
• 95 % CI ⇒ moderate
• 99 % CI ⇒ widest
• Graphs demonstrate trade-off between precision & confidence

Population: U.S. adults
Sample: n=5136; \hat{p}=0.47 reported “only positive feelings” about math
Conditions
• Independence: random sample (given)
• Success–Failure:
– n\hat{p}=5136(0.47)=2413.9\ge10
– n(1-\hat{p})=5136(0.53)=2722.1\ge10
95 % CI
• SE=\sqrt{\dfrac{0.47(0.53)}{5136}}=0.006964
• MOE = 1.960\times0.006964=0.0136
• Interval: 0.47\pm0.0136=(0.4564,\;0.4836)
Interpretation
• “We are 95 % confident that the true proportion of U.S. adults with only positive feelings about math is between 0.4564 and 0.4836.”
Misconception addressed
• Chance that p is inside this interval is not 95 %—it’s either in or out (unknown).

Parameter: p = proportion of all Gen Zers (ages 12–27) who say it’s “very important” to protect oceans, lakes & rivers from pollution
Data: n=2832,\;x=2096 ⇒ \hat{p}=0.740113
Conditions
• Independence: random sample (stated)
• Success–Failure:
– n\hat{p}=2096\ge10
– n(1-\hat{p})=736\ge10
Requested: 98 % CI
• Critical value: z^*=2.326
• SE=\sqrt{\dfrac{0.7401(1-0.7401)}{2832}}=0.00935 (approx.)
• MOE=2.326\times0.00935\approx0.0218
• CI: 0.7401\pm0.0218=(0.7209,\;0.7593)
Software confirmation (R, stats250sbi::proptest) matches 98 % CI output • p-value component shown because function defaults to H0:p=0.5, but main take-away is CI
Interpretation Questions (slide 47)
• True statements:
– (b) Range of plausible p values
– (c) 95 % of similarly constructed intervals capture p
– (e) Standard wording of 95 % confidence
• False/misleading:
– (a) Interval does NOT give exact value
– (d) Probability statement about a specific interval is incorrect
Additional prompts
• 95 % CI using same data would be narrower than 98 % CI (slide 48)
• Interval does not guarantee that >75 % of Gen Zers hold the view; need to inspect if entire CI >0.75 (slide 49)

Function: prop_test(x, n, conf.level) in package stats250sbi
• Inputs: successes (x), sample size (n), desired confidence level
• Outputs: test statistic, p-value, point estimate, and CI

Theory-based CI for a single proportion relies on conditions ensuring approx. normality of \hat{p}
Critical value ties statistical certainty to interval width; trade-off between precision and confidence
Always articulate CI in context and avoid probability fallacy for fixed intervals
Next unit: extend inference methods from proportions to population means (t-based techniques)