Notes on Hypothesis Tests for 2 Proportions

Comparing Two Proportions

  • Estimating Difference Between Two Proportions:
    • Example: Comparing admission rates at ages 17 and 18.
    • Statistic used: Difference in sample proportions
    • Population proportions:
      • Population 1: p1, size n1, sample proportion ar{p}_1
      • Population 2: p2, size n2, sample proportion ar{p}_2

Sampling Distribution of Proportions

  • For independent samples of sizes n1 and n2 from populations with parameters p1 and p2:
    • Mean:
    • ext{Mean of the sampling distribution } ar{p} = p
    • ext{Mean of } (ar{p}1 - ar{p}2) = p1 - p2
    • Standard Deviation:
    • ext{Std. deviation of } ar{p} = rac{p(1 - p)}{n}
    • ext{Std. deviation of } (ar{p}1 - ar{p}2) = \sqrt{\frac{p1(1 - p1)}{n1} + \frac{p2(1 - p2)}{n2}}
    • Normality Conditions:
    • n p ext{ and } n(1 - p) ext{ should be } ext{≥} 10 for both samples

Assumptions and Conditions

  • Independence Observations:
    • Randomization Condition:
    • Data drawn independently and randomly from a homogeneous population.
    • 10% Condition:
    • Sample should not exceed 10% of population when sampled without replacement.
  • Independent Groups:
    • Two groups must be independent.
  • Sample Size Condition:
    • Each group must be sufficiently large.
    • Success/Failure Condition:
    • n1 p1 ext{ and } n1(1 - p1) ext{ ≥ } 10
    • n2 p2 ext{ and } n2(1 - p2) ext{ ≥ } 10

Confidence Interval for 2 Population Proportions

  • Formula:
    • ar{p}1 - ar{p}2 ext{ ± } z^* \sqrt{\frac{\bar{p}1(1 - \bar{p}1)}{n1} + \frac{\bar{p}2(1 - \bar{p}2)}{n2}}
  • Example:
    • Smokers (n1=150): 95 with prominent wrinkles, ar{p}_1 = 0.63
    • Nonsmokers (n2=250): 105 with prominent wrinkles, ar{p}_2=0.42
    • 95% CI for smokers:
    • 0.63 ± 1.96 × 0.0394 = (0.55, 0.71)
    • 95% CI for nonsmokers:
    • 0.42 ± 1.96 × 0.0312 = (0.36, 0.48)
    • Check for overlap in intervals: indicates proportion differences.

Two-Proportion z Test

  • Hypotheses:
    • Two-tailed: H0: p1 - p2 = p0 vs Ha: p1 - p2 eq p0
    • Upper-tailed: H0: p1 - p2 = p0 vs Ha: p1 - p2 > p0
    • Lower-tailed: H0: p1 - p2 = p0 vs Ha: p1 - p2 < p0
  • Assumption & Conditions:
    • Random samples, independent observations, large sizes
    • Normal Conditions: n1 p1, n1(1 - p1), n2 p2, n2(1 - p2) ext{ ≥ } 10
  • Test Statistic:
    • z0 = \frac{\bar{p}1 - \bar{p}2 - (p1 - p2)}{\sqrt{\frac{p1(1 - p1)}{n1} + \frac{p2(1 - p2)}{n_2}}}
    • If null is true, pool the proportions:
    • \bar{p}{pooled} = \frac{n1 \bar{p}1 + n2 \bar{p}2}{n1 + n_2}
  • Modified test statistic:
    • z0 = \frac{\bar{p}1 - \bar{p}2}{\bar{p}{pooled}(1 - \bar{p}{pooled})(\frac{1}{n1} + \frac{1}{n_2})}

Decision Making

  • p-value Criteria:
    • For two-tailed: p-value = 2 × P(Z > z_0)
    • For upper-tail: p-value = P(Z > z_0)
    • For lower-tail: p-value = P(Z < z_0)
  • Decision rule:
    • If p-value ≤ \alpha, then reject H_0
    • If p-value > \alpha, do not reject H_0

Example Continuation

  • Proportions of smokers and non-smokers with wrinkles at \alpha=0.05:
    • Test statistic:
    • z0 = \frac{\bar{p}1 - \bar{p}2}{\bar{p}{pooled}(1 - \bar{p}{pooled})(\frac{1}{n1} + \frac{1}{n_2})} = 4.2
    • Pooled proportion:
    • \bar{p}_{pooled} = 0.5
    • p-value calculation:
    • p-value = 0
  • Conclusion:
    • Reject H_0, indicating a significant difference in proportions of smokers and non-smokers with wrinkles.