PSTAT 5LS – Theory-Based Inference for a Population Proportion
Course Logistics & Upcoming Deadlines
- Welcome to PSTAT 5LS
- Current topic: Theory-Based Inference for p (beginning Slide 31)
- Next topic: Inference for One Mean
- Deadlines
- HW 3: Monday July 14 @ 11:59 PM
- HW 4: Friday July 18 @ 11:59 PM
- Exam 1: Wednesday July 16 (during lecture)
- Office Hours
- Second OH this week: Monday @ 12 PM on Zoom
- Coverage
- Format
- 15\text{–}18 multiple-choice + 2\text{–}4 free-response
- Write directly on the exam; formula sheet provided
- What to bring
- Pencil/pen, calculator, photo ID (UCSB or other)
- Tips for success
- Read each question carefully
- Include context and units in explanations
- Show complete work (except on MC questions)
- Double-check that every part is answered
Transition from Hypothesis Testing to Estimation
- Hypothesis testing: evaluates evidence against claims about population parameters
- Estimation: uses sample statistics to approximate population parameters
- Point estimate = single number → best guess of the parameter
- For one proportion: \hat p estimates p
- Natural sampling variability → need a margin of error (MOE) to create a range of plausible values
Confidence Intervals – Core Idea
- Provide a range of plausible parameter values
- Generic structure
- \text{Point Estimate} \; \pm \; \text{Margin of Error}
- Margin of Error
- \text{MOE} = (\text{multiplier}) \times (\text{standard error})
- Multiplier = critical value from a probability distribution (usually normal)
- Standard error = estimated SD of the sampling distribution
Standard Error vs. Margin of Error (One Proportion)
- Standard Error (SE)
- Measures typical variability in \hat p
- SE = \sqrt{\dfrac{p(1-p)}{n}} (unknown p will later be replaced by \hat p)
- Margin of Error (MOE)
- Adjusts SE for the desired confidence
- MOE = z^* \times SE
- Because p is unknown, plug in \hat p for both point estimate and SE
- Confidence interval:
\hat p \; \pm \; z^* \times \sqrt{\dfrac{\hat p(1-\hat p)}{n}} - Higher confidence ⇒ larger z^* ⇒ wider interval
Visualizing Multiple CIs
- Simulations show:
- Each sample (black dot) has its own \hat p
- Blue lines = corresponding CIs; red line = true p
- Approximately the chosen % of intervals contain the red line
- Changing confidence level
- 90 % → narrow
- 95 % → medium
- 99 % → wide
Choosing the Critical Value z^*
- Standard normal cut-offs (two-sided)
- 90 % ⇒ z^*=1.645
- 95 % ⇒ z^*=1.960
- 98 % ⇒ z^*=2.326
- 99 % ⇒ z^*=2.576
- Larger z^* gives a bigger MOE to ensure higher confidence
What Does “Confidence” Mean?
- The confidence level applies to the method, not to a particular interval
- Example: 95 % confidence ⇒ if we repeated sampling infinitely, ~95 % of computed intervals would include p
- Each specific CI either contains p or it doesn’t (probability 0 or 1 after data are collected)
Conditions for Constructing a CI for p
- Independence
- Random sampling/assignment OR sampling fraction <10\% of population
- Success–Failure (S–F) Condition
- Need at least 10 expected successes and 10 expected failures
- For CIs use \hat p: n\hat p \ge 10 and n(1-\hat p) \ge 10
Three-Step CI Procedure
- Check conditions
- Calculate CI with correct z^* and SE
- Interpret in context (mention population, parameter, and confidence level)
Worked Example 1 – Gallup Math Feelings
- Survey: n = 5136 U.S. adults; \hat p = 0.47 reported only positive feelings about math
- Conditions
- Independence: random sample ⇒ satisfied
- S–F: n\hat p = 5136(0.47) = 2413.92 ≥ 10, and n(1-\hat p)=2722.08 ≥ 10 ⇒ satisfied
- 95 % CI
- SE = \sqrt{\dfrac{0.47(0.53)}{5136}} = 0.00696425
- MOE = 1.960 \times 0.00696425 = 0.0136
- Interval: 0.47 \pm 0.0136 \; \Rightarrow \; (0.4564,\;0.4836)
- Interpretation (proper wording)
- “We are 95 % confident that between 45.64 % and 48.36 % of all U.S. adults have only positive feelings about math.”
- Cannot state probability that p is in this specific interval — it either is or isn’t.
Worked Example 2 – Gen Z & Water Pollution
- Parameter: p= proportion of all Gen Zers (ages 12–27) who say protecting waters from pollution is very important
- Data: x=2096,\;n=2832,\;\hat p = 0.740113
- Conditions
- Independence: random sample ⇒ satisfied
- S–F: n\hat p = 2832(0.740113)=2096\,(\ge10) and n(1-\hat p)=736\,(\ge10) ⇒ satisfied
- 98 % CI (requires z^*=2.326)
- SE = \sqrt{\dfrac{0.740113(1-0.740113)}{2832}} = 0.00918 (approx.)
- MOE = 2.326 \times 0.00918 \approx 0.0214
- CI: 0.7401 \pm 0.0214 \Rightarrow (0.7209,\;0.7593)
- Decision Questions
- 95 % CI would be narrower (smaller z^*)
- Is more than 75 % supported? CI upper bound 0.7593, lower 0.7209 → interval includes 0.75, so cannot conclusively claim > 75 % at 98 % confidence.
- Multiple-choice interpretation (Slide 47)
- Correct statements: b, c, e
Using R – prop_test()
prop_test(x = 2096, n = 2832, conf.level = 0.98)
## 1-sample proportions test without continuity correction
## Z = 25.556, p-value < 2.2e-16
## 98 percent confidence interval:
## 0.7209409 0.7592851
## sample estimates:
## p
## 0.740113
- Function automatically calculates CI, Z statistic, and p-value (default null p_0=0.5)
- Ensures reproducibility and quick checks during analysis
Conceptual Take-Aways
- MOE grows with both z^* and SE
- CIs quantify uncertainty; they do not guarantee the parameter lies inside
- Always verify assumptions before trusting a CI
Looking Forward
- Having formalized theory-based CIs & tests for one proportion, the course will extend these concepts to means (Slide 51 & future lectures).