Notes on Hypothesis Tests for 2 Proportions

Comparing Two Proportions

  • Estimating Difference Between Two Proportions:
    • Example: Comparing admission rates at ages 17 and 18.
    • Statistic used: Difference in sample proportions
    • Population proportions:
      • Population 1: p<em>1p<em>1, size n</em>1n</em>1, sample proportion pˉ1\bar{p}_1
      • Population 2: p<em>2p<em>2, size n</em>2n</em>2, sample proportion pˉ2\bar{p}_2

Sampling Distribution of Proportions

  • For independent samples of sizes n<em>1n<em>1 and n</em>2n</em>2 from populations with parameters p<em>1p<em>1 and p</em>2p</em>2:
    • Mean:
    • extMeanofthesamplingdistributionpˉ=pext{Mean of the sampling distribution } \bar{p} = p
    • extMeanof(pˉ<em>1pˉ</em>2)=p<em>1p</em>2ext{Mean of } (\bar{p}<em>1 - \bar{p}</em>2) = p<em>1 - p</em>2
    • Standard Deviation:
    • extStd.deviationofpˉ=p(1p)next{Std. deviation of } \bar{p} = \frac{p(1 - p)}{n}
    • extStd.deviationof(pˉ<em>1pˉ</em>2)=p<em>1(1p</em>1)n<em>1+p</em>2(1p<em>2)n</em>2ext{Std. deviation of } (\bar{p}<em>1 - \bar{p}</em>2) = \sqrt{\frac{p<em>1(1 - p</em>1)}{n<em>1} + \frac{p</em>2(1 - p<em>2)}{n</em>2}}
    • Normality Conditions:
    • npextandn(1p)extshouldbeext10n p ext{ and } n(1 - p) ext{ should be } ext{≥} 10 for both samples

Assumptions and Conditions

  • Independence Observations:
    • Randomization Condition:
    • Data drawn independently and randomly from a homogeneous population.
    • 10% Condition:
    • Sample should not exceed 10% of population when sampled without replacement.
  • Independent Groups:
    • Two groups must be independent.
  • Sample Size Condition:
    • Each group must be sufficiently large.
    • Success/Failure Condition:
    • n<em>1p</em>1extandn<em>1(1p</em>1)ext10n<em>1 p</em>1 ext{ and } n<em>1(1 - p</em>1) ext{ ≥ } 10
    • n<em>2p</em>2extandn<em>2(1p</em>2)ext10n<em>2 p</em>2 ext{ and } n<em>2(1 - p</em>2) ext{ ≥ } 10

Confidence Interval for 2 Population Proportions

  • Formula:
    • pˉ<em>1pˉ</em>2ext±zpˉ<em>1(1pˉ</em>1)n<em>1+pˉ</em>2(1pˉ<em>2)n</em>2\bar{p}<em>1 - \bar{p}</em>2 ext{ ± } z^* \sqrt{\frac{\bar{p}<em>1(1 - \bar{p}</em>1)}{n<em>1} + \frac{\bar{p}</em>2(1 - \bar{p}<em>2)}{n</em>2}}
  • Example:
    • Smokers (n1=150): 95 with prominent wrinkles, pˉ1=0.63\bar{p}_1 = 0.63
    • Nonsmokers (n2=250): 105 with prominent wrinkles, pˉ2=0.42\bar{p}_2=0.42
    • 95% CI for smokers:
    • 0.63±1.96×0.0394=(0.55,0.71)0.63 ± 1.96 × 0.0394 = (0.55, 0.71)
    • 95% CI for nonsmokers:
    • 0.42±1.96×0.0312=(0.36,0.48)0.42 ± 1.96 × 0.0312 = (0.36, 0.48)
    • Check for overlap in intervals: indicates proportion differences.

Two-Proportion z Test

  • Hypotheses:
    • Two-tailed: H<em>0:p</em>1p<em>2=p</em>0H<em>0: p</em>1 - p<em>2 = p</em>0 vs H<em>a:p</em>1p<em>2p</em>0H<em>a: p</em>1 - p<em>2 \neq p</em>0
    • Upper-tailed: H<em>0:p</em>1p<em>2=p</em>0H<em>0: p</em>1 - p<em>2 = p</em>0 vs H<em>a:p</em>1p<em>2>p</em>0H<em>a: p</em>1 - p<em>2 > p</em>0
    • Lower-tailed: H<em>0:p</em>1p<em>2=p</em>0H<em>0: p</em>1 - p<em>2 = p</em>0 vs H<em>a:p</em>1p<em>2<p</em>0H<em>a: p</em>1 - p<em>2 < p</em>0
  • Assumption & Conditions:
    • Random samples, independent observations, large sizes
    • Normal Conditions: n<em>1p</em>1,n<em>1(1p</em>1),n<em>2p</em>2,n<em>2(1p</em>2)ext10n<em>1 p</em>1, n<em>1(1 - p</em>1), n<em>2 p</em>2, n<em>2(1 - p</em>2) ext{ ≥ } 10
  • Test Statistic:
    • z<em>0=pˉ</em>1pˉ<em>2(p</em>1p<em>2)p</em>1(1p<em>1)n</em>1+p<em>2(1p</em>2)n2z<em>0 = \frac{\bar{p}</em>1 - \bar{p}<em>2 - (p</em>1 - p<em>2)}{\sqrt{\frac{p</em>1(1 - p<em>1)}{n</em>1} + \frac{p<em>2(1 - p</em>2)}{n_2}}}
    • If null is true, pool the proportions:
    • pˉ<em>pooled=n</em>1pˉ<em>1+n</em>2pˉ<em>2n</em>1+n2\bar{p}<em>{pooled} = \frac{n</em>1 \bar{p}<em>1 + n</em>2 \bar{p}<em>2}{n</em>1 + n_2}
  • Modified test statistic:
    • z<em>0=pˉ</em>1pˉ<em>2pˉ</em>pooled(1pˉ<em>pooled)(1n</em>1+1n2)z<em>0 = \frac{\bar{p}</em>1 - \bar{p}<em>2}{\bar{p}</em>{pooled}(1 - \bar{p}<em>{pooled})(\frac{1}{n</em>1} + \frac{1}{n_2})}

Decision Making

  • p-value Criteria:
    • For two-tailed: p-value = 2 × P(Z > z_0)
    • For upper-tail: p-value = P(Z > z_0)
    • For lower-tail: p-value = P(Z < z_0)
  • Decision rule:
    • If pvalueαp-value ≤ \alpha, then reject H0H_0
    • If p-value > \alpha, do not reject H0H_0

Example Continuation

  • Proportions of smokers and non-smokers with wrinkles at α=0.05\alpha=0.05:
    • Test statistic:
    • z<em>0=pˉ</em>1pˉ<em>2pˉ</em>pooled(1pˉ<em>pooled)(1n</em>1+1n2)=4.2z<em>0 = \frac{\bar{p}</em>1 - \bar{p}<em>2}{\bar{p}</em>{pooled}(1 - \bar{p}<em>{pooled})(\frac{1}{n</em>1} + \frac{1}{n_2})} = 4.2
    • Pooled proportion:
    • pˉpooled=0.5\bar{p}_{pooled} = 0.5
    • p-value calculation:
    • pvalue=0p-value = 0
  • Conclusion:
    • Reject H0H_0, indicating a significant difference in proportions of smokers and non-smokers with wrinkles.