Simulation-Based Inference for p – Comprehensive Study Notes
Announcements & Course Logistics
- Today’s topic: Simulation-Based Inference for a Population Proportion (p) (starting at slide 13 of Set 3)
- Next class: Normal Distributions (Slide Set 4)
- Upcoming homework deadlines
- HW 2: Tuesday, July 8, 11:59 PM
- HW 3: Friday, July 11, 11:59 PM
- Instructor office hours
- Tuesdays & Thursdays, 2–3 PM via Zoom
- Encouraged to visit for questions on material, coding, or assignments
Big Picture of Simulation-Based Inference for $p$
- Goal: Decide whether observed sample evidence suggests the population proportion differs from a hypothesized value (p0)
- Strategy
- Model the null hypothesis (H<em>0:p=p</em>0) with a chance process
- Simulate many random samples under that model
- Compare the observed statistic (p^) to the simulated distribution
- Quantify extremeness via the p-value (proportion of simulations at least as extreme as observed)
- Research question: Does the proportion of adults who recycle equal 0.70?
- Null hypothesis: H0:p=0.70 (70 % of all adults recycle)
- Alternative: Ha:p=0.70 (two-sided because no prior directional claim)
- Physical model
- Use colored poker chips to represent “recycle” vs “not recycle”
- Blue chip = recycler, yellow chip = non-recycler
- Bag composition reflects p0: 70 % blue, 30 % yellow
- Ten-chip miniature model: 7 blue, 3 yellow (maintains 7∶3 ratio)
- Sampling correspondence
- One draw ⇢ one survey respondent
- One repetition (800 draws) ⇢ one entire sample of 800 adults
- Drawing with replacement keeps the population proportion constant and permits enough draws
- Note on earlier example discrepancy
- A slide referenced 1 207 draws (previous button-pressing context); current recycling study uses 800 draws
Observed Data
- Survey of 800 U.S. adults
- 530 said they recycle
- Sample proportion: p^=800530=0.6625 (66.25 %)
Computer Simulation Procedure
- R helper function:
simulate_chance_model(chanceSuccess = 0.70, numDraws = 800, numRepetitions = 5000)chanceSuccess: p0=0.70numDraws: sample size n=800 per repetitionnumRepetitions: 5 000 synthetic samples
- Output stored in object sim1 containing the 5 000 simulated sample proportions
- Visualization: Histogram of the 5 000 p^ values shows a roughly bell-shaped distribution centered at 0.70, spanning ≈ 0.64–0.76
Evaluating the Results (Calculating the p-value)
- For two-sided test, “as extreme” means ∣p^sim−0.70∣≥∣0.6625−0.70∣=0.0375
- Left-tail cutoff: 0.70−0.0375=0.6625
- Right-tail cutoff: 0.70+0.0375=0.7375
- R code snippets presented
sum(sim1 <= 530/800) returned 62 simulated samples in left tailsum(sim1 >= 0.70 + (0.70 - 530/800)) returned 46 simulated samples in right tail- Total extreme counts: 62+46=108
- Estimated p-value:
p-value=5000108=0.0216
- Interpretation: 2.16 % chance of obtaining p^ at least 0.0375 away from 0.70 under H0
Interpreting the p-value
- Small p-value (≈ 0.02) ⇒ observed sample proportion is unusual if p=0.70
- Conclusion at common significance level α=0.05:
- Since 0.0216 < 0.05, reject H0
- Evidence suggests the true recycling proportion differs from 70 %
- Practical wording: “We have strong evidence that the population proportion of recyclers is not 70 %.”
Vocabulary Check: Three Different “p”s
- p — population proportion (parameter)
- p^ — sample proportion (statistic)
- p-value — probability of observing a statistic at least as extreme as the one obtained, assuming H0 is correct
Role of the Alternative Hypothesis in Tail Choice
- Direction determines which simulated values count as “extreme”
- H<em>a:p<p</em>0 ⇒ left tail only
- H<em>a:p>p</em>0 ⇒ right tail only
- H<em>a:p=p</em>0 ⇒ both tails (twice the extremeness)
- Always decide one- vs two-sided before seeing data; post-hoc switching inflates Type I error
Decision-Making with Significance Level α
- Typical thresholds: α=0.10,0.05,0.01
- Decision rules
- p-value≤α ⇒ Reject H0 ⇒ results called “statistically significant”
- \text{p-value} > \alpha ⇒ Fail to reject H0 ⇒ insufficient evidence
- Preferred instructor framing: “strength of evidence” rather than rigid significant/not significant labels
Consequences of Failing to Reject H0
- Does not prove H<em>0 is true—merely that data were plausible under H</em>0
- Next steps when H0 isn’t rejected
- Examine sample size / power
- Consider whether smaller effects still matter practically
- Plan a follow-up study with larger n or improved design
- Report findings transparently, noting limitations
- Example cited: Jenner & Jenner (2007) after-school program study found no significant test score gains but acknowledged potential other benefits
Worked Examples Recap
- Buzz’s Button-Pressing Game (earlier lecture context)
- Estimated p-value ≈ 0 ⇒ reject H0; strong evidence Buzz wasn’t guessing
- Recycling Proportion
- p-value =0.0216 ⇒ reject H0; strong evidence true proportion ≠ 70 %
Summary of Simulation-Based Significance Testing Procedure
- Observe the sample statistic p^
- Simulate many samples under H0 to build the null distribution
- Assess extremeness of p^ within that distribution via the p-value
- Decide: small p-value ⇒ evidence against H<em>0; large p-value ⇒ data consistent with H</em>0
- Core ideas learned here (model, simulate, compare, decide) will extend to:
- Means, differences, regression, multiple proportions, etc.
- Mastery of these basics ensures preparedness for upcoming topics (e.g., Normal distribution theory next lecture)
- Always pre-specify hypotheses and significance levels; avoid p-hacking