Simulation-Based Inference for p – Comprehensive Study Notes

Today’s topic: Simulation-Based Inference for a Population Proportion $(p)$ (starting at slide 13 of Set 3)
Next class: Normal Distributions (Slide Set 4)
Upcoming homework deadlines
- HW 2: Tuesday, July 8, 11:59 PM
- HW 3: Friday, July 11, 11:59 PM
Instructor office hours
- Tuesdays & Thursdays, 2–3 PM via Zoom
- Encouraged to visit for questions on material, coding, or assignments

Goal: Decide whether observed sample evidence suggests the population proportion differs from a hypothesized value $(p_0)$
Strategy
1. Model the null hypothesis $(H0 : p = p0)$ with a chance process
2. Simulate many random samples under that model
3. Compare the observed statistic $(\hat p)$ to the simulated distribution
4. Quantify extremeness via the p-value (proportion of simulations at least as extreme as observed)

Research question: Does the proportion of adults who recycle equal $0.70$ ?
Null hypothesis: $H_0 : p = 0.70$ (70 % of all adults recycle)
Alternative: $H_a : p \neq 0.70$ (two-sided because no prior directional claim)
Physical model
- Use colored poker chips to represent “recycle” vs “not recycle”
- Blue chip = recycler, yellow chip = non-recycler
- Bag composition reflects $p_0$ : 70 % blue, 30 % yellow
- Ten-chip miniature model: 7 blue, 3 yellow (maintains 7∶3 ratio)
Sampling correspondence
- One draw ⇢ one survey respondent
- One repetition (800 draws) ⇢ one entire sample of 800 adults
- Drawing with replacement keeps the population proportion constant and permits enough draws
Note on earlier example discrepancy
- A slide referenced 1 207 draws (previous button-pressing context); current recycling study uses 800 draws

Survey of 800 U.S. adults
- 530 said they recycle
- Sample proportion: $\hat p = \frac{530}{800} = 0.6625$ (66.25 %)

R helper function: simulate_chance_model(chanceSuccess = 0.70, numDraws = 800, numRepetitions = 5000)
- chanceSuccess: $p_0 = 0.70$
- numDraws: sample size $n = 800$ per repetition
- numRepetitions: 5 000 synthetic samples
Output stored in object sim1 containing the 5 000 simulated sample proportions
Visualization: Histogram of the 5 000 $\hat p$ values shows a roughly bell-shaped distribution centered at $0.70$ , spanning ≈ $0.64$ – $0.76$

For two-sided test, “as extreme” means $|\hat p_{sim} - 0.70| \ge |0.6625 - 0.70| = 0.0375$
- Left-tail cutoff: $0.70 - 0.0375 = 0.6625$
- Right-tail cutoff: $0.70 + 0.0375 = 0.7375$
R code snippets presented
- sum(sim1 <= 530/800) returned 62 simulated samples in left tail
- sum(sim1 >= 0.70 + (0.70 - 530/800)) returned 46 simulated samples in right tail
- Total extreme counts: $62 + 46 = 108$
Estimated p-value: $p\text{-value} = \frac{108}{5000} = 0.0216$
- Interpretation: 2.16 % chance of obtaining $\hat p$ at least $0.0375$ away from $0.70$ under $H_0$

Small p-value (≈ 0.02) ⇒ observed sample proportion is unusual if $p = 0.70$
Conclusion at common significance level $\alpha = 0.05$ :
- Since 0.0216 < 0.05, reject $H_0$
- Evidence suggests the true recycling proportion differs from 70 %
Practical wording: “We have strong evidence that the population proportion of recyclers is not 70 %.”

$p$ — population proportion (parameter)
$\hat p$ — sample proportion (statistic)
p-value — probability of observing a statistic at least as extreme as the one obtained, assuming $H_0$ is correct

Direction determines which simulated values count as “extreme”
- $Ha : p < p0$ ⇒ left tail only
- $Ha : p > p0$ ⇒ right tail only
- $Ha : p \neq p0$ ⇒ both tails (twice the extremeness)
Always decide one- vs two-sided before seeing data; post-hoc switching inflates Type I error

Typical thresholds: $\alpha = 0.10, 0.05, 0.01$
Decision rules
- $\text{p-value} \le \alpha$ ⇒ Reject $H_0$ ⇒ results called “statistically significant”
- \text{p-value} > \alpha ⇒ Fail to reject $H_0$ ⇒ insufficient evidence
Preferred instructor framing: “strength of evidence” rather than rigid significant/not significant labels

Does not prove $H0$ is true—merely that data were plausible under $H0$
Next steps when $H_0$ isn’t rejected
- Examine sample size / power
- Consider whether smaller effects still matter practically
- Plan a follow-up study with larger $n$ or improved design
- Report findings transparently, noting limitations
Example cited: Jenner & Jenner (2007) after-school program study found no significant test score gains but acknowledged potential other benefits

Buzz’s Button-Pressing Game (earlier lecture context)
- Estimated p-value ≈ 0 ⇒ reject $H_0$ ; strong evidence Buzz wasn’t guessing
Recycling Proportion
- p-value $= 0.0216$ ⇒ reject $H_0$ ; strong evidence true proportion ≠ 70 %

Observe the sample statistic $\hat p$
Simulate many samples under $H_0$ to build the null distribution
Assess extremeness of $\hat p$ within that distribution via the p-value
Decide: small p-value ⇒ evidence against $H0$ ; large p-value ⇒ data consistent with $H0$

Core ideas learned here (model, simulate, compare, decide) will extend to:
- Means, differences, regression, multiple proportions, etc.
Mastery of these basics ensures preparedness for upcoming topics (e.g., Normal distribution theory next lecture)
Always pre-specify hypotheses and significance levels; avoid p-hacking