Theory-Based Confidence Intervals for One Population Proportion – PSTAT 5LS Lecture Notes
Administrative Announcements
- Course: PSTAT 5LS – Statistical Literacy (Summer Session)
- Current Topic: Slide Set 5 – Theory-Based Inference for a Population Proportion, p
- Today’s coverage begins at slide 31
- Next lecture: Slide Set 6 – Inference for One Mean
- Deadlines
• HW 3 due Mon Jul 14 @ 11:59 PM
• HW 4 due Fri Jul 18 @ 11:59 PM
• Exam 1 during lecture on Wed Jul 16
– Covers Slide Sets 1–5 & HW 1–3 - Exam 1 logistics
• Format: 15–18 MC + 2–4 free-response
• Bring: writing tool(s), calculator, photo ID
• Formula sheet supplied; you may write on the exam
• Tips: read carefully, include context/units, show work for FR, review answers - Office Hours
• Extra OH at noon, Mon (Zoom) this week
• Encouragement: “Visit us in office hours!”
Transition – From Hypothesis Testing to Estimation
- Hypothesis testing ⇢ evaluates evidence about population parameters (e.g., H0:p=p0)
- Estimation ⇢ uses sample statistics to approximate unknown parameters
- Point estimate = single best guess for a parameter
• One-proportion case: \hat{p} estimates p - Natural sampling variability ⇒ \hat{p} rarely equals p exactly
• We therefore add a “wiggle room” around the point estimate
• This wiggle room = Margin of Error (MOE)
Confidence Intervals – Conceptual Framework
- A Confidence Interval (CI) gives a range of plausible values for the parameter
• Generic structure: \text{point estimate} \;\pm\; \text{margin of error} - Margin of Error formula
• \text{MOE}=\text{multiplier}\times\text{standard error}
• Multiplier = critical value from a probability distribution (typically z for large-sample proportion CI)
• Standard Error (SE) estimates the SD of sampling distribution
Standard Error vs Margin of Error
- Standard Error
• Meaning: typical variability of \hat{p} from sample to sample
• Formula (unknown p): SE=\sqrt{\dfrac{p(1-p)}{n}}
• Formula (practical—substitute \hat{p}): SE=\sqrt{\dfrac{\hat p(1-\hat p)}{n}} - Margin of Error
• Adjusts SE by desired confidence level
• MOE = z^* \times SE
- Because p is unknown, replace it with \hat{p} in the SE
- Final form (large-sample, theory-based CI):
\hat{p} \;\pm\; z^{*}\;\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}} - Properties
• Higher confidence ⇒ larger z^ ⇒ wider interval
• Lower confidence ⇒ smaller z^ ⇒ narrower interval
Common Critical Values (Standard Normal)
- 90 % ⇒ z^*=1.645
- 95 % ⇒ z^*=1.960
- 98 % ⇒ z^*=2.326
- 99 % ⇒ z^*=2.576
Interpreting “Confidence”
- Confidence level refers to the long-run success rate of the method—not probability that a specific interval contains p
• e.g., 95 % CI: If we repeated the study many times, ≈95 % of resulting intervals would include the true p - A single computed CI either does (100 % fact) or does not (0 %) contain p—we just don’t know which
Conditions for Valid One-Proportion CI (Theory-Based)
- Independence (or random sampling / random assignment)
• Practically: simple random sample, or population ≥10× sample size for sampling w/o replacement - Success–Failure condition
• At least 10 successes & 10 failures in the sample
• Check using n\hat{p} \ge 10 and n(1-\hat{p}) \ge 10
Three-Step CI Procedure
- Check conditions (Independence & Success–Failure)
- Calculate CI using \hat{p} \pm z^*SE
- Interpret in context (include parameter in words, level of confidence, and numeric bounds)
Visual Insights (Slides 36–37)
- Simulated plot: many sample proportions (black dots) with their individual CIs (blue lines); true p is a red line
• Some intervals miss the true line; most capture it according to confidence level - Comparison of widths
• 90 % CI ⇒ narrowest
• 95 % CI ⇒ moderate
• 99 % CI ⇒ widest
• Graphs demonstrate trade-off between precision & confidence
Worked Example 1 – Gallup Poll on Positive Math Feelings
- Population: U.S. adults
- Sample: n=5136; \hat{p}=0.47 reported “only positive feelings” about math
- Conditions
• Independence: random sample (given)
• Success–Failure:
– n\hat{p}=5136(0.47)=2413.9\ge10
– n(1-\hat{p})=5136(0.53)=2722.1\ge10 - 95 % CI
• SE=\sqrt{\dfrac{0.47(0.53)}{5136}}=0.006964
• MOE = 1.960\times0.006964=0.0136
• Interval: 0.47\pm0.0136=(0.4564,\;0.4836) - Interpretation
• “We are 95 % confident that the true proportion of U.S. adults with only positive feelings about math is between 0.4564 and 0.4836.” - Misconception addressed
• Chance that p is inside this interval is not 95 %—it’s either in or out (unknown).
Worked Example 2 – Gen Z & Water Pollution (Walton Family Foundation / Gallup)
- Parameter: p = proportion of all Gen Zers (ages 12–27) who say it’s “very important” to protect oceans, lakes & rivers from pollution
- Data: n=2832,\;x=2096 ⇒ \hat{p}=0.740113
- Conditions
• Independence: random sample (stated)
• Success–Failure:
– n\hat{p}=2096\ge10
– n(1-\hat{p})=736\ge10 - Requested: 98 % CI
• Critical value: z^*=2.326
• SE=\sqrt{\dfrac{0.7401(1-0.7401)}{2832}}=0.00935 (approx.)
• MOE=2.326\times0.00935\approx0.0218
• CI: 0.7401\pm0.0218=(0.7209,\;0.7593) - Software confirmation (R, stats250sbi::proptest) matches 98 % CI output
• p-value component shown because function defaults to H0:p=0.5, but main take-away is CI
- Interpretation Questions (slide 47)
• True statements:
– (b) Range of plausible p values
– (c) 95 % of similarly constructed intervals capture p
– (e) Standard wording of 95 % confidence
• False/misleading:
– (a) Interval does NOT give exact value
– (d) Probability statement about a specific interval is incorrect - Additional prompts
• 95 % CI using same data would be narrower than 98 % CI (slide 48)
• Interval does not guarantee that >75 % of Gen Zers hold the view; need to inspect if entire CI >0.75 (slide 49)
- Function:
prop_test(x, n, conf.level) in package stats250sbi
• Inputs: successes (x), sample size (n), desired confidence level
• Outputs: test statistic, p-value, point estimate, and CI
Key Takeaways & Looking Ahead
- Theory-based CI for a single proportion relies on conditions ensuring approx. normality of \hat{p}
- Critical value ties statistical certainty to interval width; trade-off between precision and confidence
- Always articulate CI in context and avoid probability fallacy for fixed intervals
- Next unit: extend inference methods from proportions to population means (t-based techniques)