Theory-Based Confidence Intervals for One Population Proportion – PSTAT 5LS Lecture Notes

Administrative Announcements

  • Course: PSTAT 5LS – Statistical Literacy (Summer Session)
  • Current Topic: Slide Set 5 – Theory-Based Inference for a Population Proportion, p
    • Today’s coverage begins at slide 31
    • Next lecture: Slide Set 6 – Inference for One Mean
  • Deadlines
    • HW 3 due Mon Jul 14 @ 11:59 PM
    • HW 4 due Fri Jul 18 @ 11:59 PM
    • Exam 1 during lecture on Wed Jul 16
    – Covers Slide Sets 1–5 & HW 1–3
  • Exam 1 logistics
    • Format: 15–18 MC + 2–4 free-response
    • Bring: writing tool(s), calculator, photo ID
    • Formula sheet supplied; you may write on the exam
    • Tips: read carefully, include context/units, show work for FR, review answers
  • Office Hours
    • Extra OH at noon, Mon (Zoom) this week
    • Encouragement: “Visit us in office hours!”

Transition – From Hypothesis Testing to Estimation

  • Hypothesis testing ⇢ evaluates evidence about population parameters (e.g., H0:p=p0)
  • Estimation ⇢ uses sample statistics to approximate unknown parameters
  • Point estimate = single best guess for a parameter
    • One-proportion case: \hat{p} estimates p
  • Natural sampling variability ⇒ \hat{p} rarely equals p exactly
    • We therefore add a “wiggle room” around the point estimate
    • This wiggle room = Margin of Error (MOE)

Confidence Intervals – Conceptual Framework

  • A Confidence Interval (CI) gives a range of plausible values for the parameter
    • Generic structure: \text{point estimate} \;\pm\; \text{margin of error}
  • Margin of Error formula
    • \text{MOE}=\text{multiplier}\times\text{standard error}
    • Multiplier = critical value from a probability distribution (typically z for large-sample proportion CI)
    • Standard Error (SE) estimates the SD of sampling distribution

Standard Error vs Margin of Error

  • Standard Error
    • Meaning: typical variability of \hat{p} from sample to sample
    • Formula (unknown p): SE=\sqrt{\dfrac{p(1-p)}{n}}
    • Formula (practical—substitute \hat{p}): SE=\sqrt{\dfrac{\hat p(1-\hat p)}{n}}
  • Margin of Error
    • Adjusts SE by desired confidence level
    • MOE = z^* \times SE

One-Proportion Confidence Interval – Working Formula

  • Because p is unknown, replace it with \hat{p} in the SE
  • Final form (large-sample, theory-based CI):
    \hat{p} \;\pm\; z^{*}\;\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}
  • Properties
    • Higher confidence ⇒ larger z^ ⇒ wider interval • Lower confidence ⇒ smaller z^ ⇒ narrower interval

Common Critical Values (Standard Normal)

  • 90 % ⇒ z^*=1.645
  • 95 % ⇒ z^*=1.960
  • 98 % ⇒ z^*=2.326
  • 99 % ⇒ z^*=2.576

Interpreting “Confidence”

  • Confidence level refers to the long-run success rate of the method—not probability that a specific interval contains p
    • e.g., 95 % CI: If we repeated the study many times, ≈95 % of resulting intervals would include the true p
  • A single computed CI either does (100 % fact) or does not (0 %) contain p—we just don’t know which

Conditions for Valid One-Proportion CI (Theory-Based)

  1. Independence (or random sampling / random assignment)
    • Practically: simple random sample, or population ≥10× sample size for sampling w/o replacement
  2. Success–Failure condition
    • At least 10 successes & 10 failures in the sample
    • Check using n\hat{p} \ge 10 and n(1-\hat{p}) \ge 10

Three-Step CI Procedure

  1. Check conditions (Independence & Success–Failure)
  2. Calculate CI using \hat{p} \pm z^*SE
  3. Interpret in context (include parameter in words, level of confidence, and numeric bounds)

Visual Insights (Slides 36–37)

  • Simulated plot: many sample proportions (black dots) with their individual CIs (blue lines); true p is a red line
    • Some intervals miss the true line; most capture it according to confidence level
  • Comparison of widths
    • 90 % CI ⇒ narrowest
    • 95 % CI ⇒ moderate
    • 99 % CI ⇒ widest
    • Graphs demonstrate trade-off between precision & confidence

Worked Example 1 – Gallup Poll on Positive Math Feelings

  • Population: U.S. adults
  • Sample: n=5136; \hat{p}=0.47 reported “only positive feelings” about math
  • Conditions
    • Independence: random sample (given)
    • Success–Failure:
    – n\hat{p}=5136(0.47)=2413.9\ge10
    – n(1-\hat{p})=5136(0.53)=2722.1\ge10
  • 95 % CI
    • SE=\sqrt{\dfrac{0.47(0.53)}{5136}}=0.006964
    • MOE = 1.960\times0.006964=0.0136
    • Interval: 0.47\pm0.0136=(0.4564,\;0.4836)
  • Interpretation
    • “We are 95 % confident that the true proportion of U.S. adults with only positive feelings about math is between 0.4564 and 0.4836.”
  • Misconception addressed
    • Chance that p is inside this interval is not 95 %—it’s either in or out (unknown).

Worked Example 2 – Gen Z & Water Pollution (Walton Family Foundation / Gallup)

  • Parameter: p = proportion of all Gen Zers (ages 12–27) who say it’s “very important” to protect oceans, lakes & rivers from pollution
  • Data: n=2832,\;x=2096 ⇒ \hat{p}=0.740113
  • Conditions
    • Independence: random sample (stated)
    • Success–Failure:
    – n\hat{p}=2096\ge10
    – n(1-\hat{p})=736\ge10
  • Requested: 98 % CI
    • Critical value: z^*=2.326
    • SE=\sqrt{\dfrac{0.7401(1-0.7401)}{2832}}=0.00935 (approx.)
    • MOE=2.326\times0.00935\approx0.0218
    • CI: 0.7401\pm0.0218=(0.7209,\;0.7593)
  • Software confirmation (R, stats250sbi::proptest) matches 98 % CI output • p-value component shown because function defaults to H0:p=0.5, but main take-away is CI
  • Interpretation Questions (slide 47)
    • True statements:
    – (b) Range of plausible p values
    – (c) 95 % of similarly constructed intervals capture p
    – (e) Standard wording of 95 % confidence
    • False/misleading:
    – (a) Interval does NOT give exact value
    – (d) Probability statement about a specific interval is incorrect
  • Additional prompts
    • 95 % CI using same data would be narrower than 98 % CI (slide 48)
    • Interval does not guarantee that >75 % of Gen Zers hold the view; need to inspect if entire CI >0.75 (slide 49)

Computational Tooling (R)

  • Function: prop_test(x, n, conf.level) in package stats250sbi
    • Inputs: successes (x), sample size (n), desired confidence level
    • Outputs: test statistic, p-value, point estimate, and CI

Key Takeaways & Looking Ahead

  • Theory-based CI for a single proportion relies on conditions ensuring approx. normality of \hat{p}
  • Critical value ties statistical certainty to interval width; trade-off between precision and confidence
  • Always articulate CI in context and avoid probability fallacy for fixed intervals
  • Next unit: extend inference methods from proportions to population means (t-based techniques)