Inference for Categorical Data: Two-Sample Z-Test for Proportions

Overview of the Two-Sample Z-Test for Proportions * The Two-Sample Z-Test is used to determine if there is a statistically significant difference between two population proportions. * The test focuses on the difference between population proportion one (p1p_1) and population proportion two (p2p_2). * The process involves taking a sample from population one and a sample from population two, then comparing the resulting sample proportions (p^<em>1\hat{p}<em>1 and p^2\hat{p}_2). * The goal is to determine if the observed sample proportions are far enough apart to conclude they represent a real difference in populations, or if the difference is close enough to be attributed to sampling variability. # Step 1: Naming the Test and Stating Hypotheses * Test Name: Two-Sample Z-Test for the Difference of Two Population Proportions. * Null Hypothesis (H0H_0): Assumes there is no difference between the two population proportions (H0:p1=p2H_0: p_1 = p_2). This means the difference between them is exactly zero. * Alternative Hypothesis (HaH_a): This is based on the specific claim or concern in the problem. There are three options: * Option 1: The proportion from population one is greater than the proportion from population two (Ha:p1>p2H_a: p_1 > p_2). * Option 2: The proportion from population one is less than the proportion from population two (Ha:p1<p2H_a: p_1 < p_2). * Option 3: The two proportions are simply not equal, regardless of which is higher or lower (Ha:p1p2H_a: p_1 \neq p_2). # Step 2: Checking Conditions and Building the Sampling Distribution * Conditions: The standard conditions (Random, Independent, and Normal/Large Counts) must be verified for both samples. On the AP Exam, it is common to see scenarios where these conditions are stated to be met. * Mean of the Sampling Distribution: Our model assumes the null hypothesis is true. Therefore, the mean of the sampling distribution of the differences is centered at zero (μ</em>p^1p^2=0\mu</em>{\hat{p}_1 - \hat{p}_2} = 0). * Standard Deviation vs. Standard Error: Because the true population proportions are unknown, we cannot use the standard deviation formula. Instead, we must use sample data to calculate the Standard Error. * Standard Error Combined (Pooling): Under the null hypothesis assumption that there is no difference (p1=p2p_1 = p_2), the two samples are essentially from the same population. To reflect this, researchers combine the data into one single "combined sample proportion" (p^c\hat{p}_c). * Formula for Combined Sample Proportion (p^c\hat{p}_c): * p^c=Successes1+Successes2n1+n2\hat{p}_c = \frac{\text{Successes}_1 + \text{Successes}_2}{n_1 + n_2} * Alternatively: p^c=n1p^1+n2p^2n1+n2\hat{p}_c = \frac{n_1\hat{p}_1 + n_2\hat{p}_2}{n_1 + n_2} * Formula for Standard Error Combined (SEcSE_c): * SEc=p^c(1p^c)n1+p^c(1p^c)n2SE_c = \sqrt{\frac{\hat{p}_c(1 - \hat{p}_c)}{n_1} + \frac{\hat{p}_c(1 - \hat{p}_c)}{n_2}} * Mathematically equivalent version: SEc=p^c(1p^c)×(1n1+1n2)SE_c = \sqrt{\hat{p}_c(1 - \hat{p}_c) \times (\frac{1}{n_1} + \frac{1}{n_2})} * Note on Usage: Combining/pooling data is only performed for two-sample Z-tests, not for confidence intervals or one-sample tests. # Step 3: Finding the Test Statistic and P-Value * Observed Difference: Calculate the difference between the two sample proportions (p^1p^2\hat{p}_1 - \hat{p}_2). * Test Statistic (Z-score): This informs us how many standard errors the observed difference is from the assumed center of zero. * Z=(Observed Difference)(Hypothesized Difference)SEcZ = \frac{(\text{Observed Difference}) - (\text{Hypothesized Difference})}{SE_c} * Standard calculation: Z=(p^1p^2)0SEcZ = \frac{(\hat{p}_1 - \hat{p}_2) - 0}{SE_c} * P-Value Calculation: The P-value is the probability of observing a difference as extreme or more extreme than the one calculated, assuming the null hypothesis is true. * Use the normal table or a calculator (NormalCDF) with the calculated Z-score to determine this probability. # Step 4: Making a Conclusion * Comparison: Compare the calculated P-value to the level of significance (typically α=0.05\alpha = 0.05 or α=0.01\alpha = 0.01). * Rejecting the Null: If the P-value is less than the significance level (P-value < α\alpha), reject the null hypothesis. This means there is significant evidence to support the alternative hypothesis. * Failing to Reject the Null: If the P-value is greater than the significance level (P-value > α\alpha), fail to reject the null hypothesis. This means there is not enough evidence to support the alternative hypothesis. Do not "accept" the null; simply state there is insufficient evidence for the alternative. # Case Study: Toy Company Defect Analysis * Scenario: A CEO is concerned that the proportion of defective toys from the night shift is higher than those from the day shift. * Data Collection: * Sample size for both shifts: 500500 toys per shift (nn=500,nd=500n_n = 500, n_d = 500). * Night shift defects: 6262 out of 500500 (p^n=0.124\hat{p}_n = 0.124 or 12.4%12.4\%). * Day shift defects: 3737 out of 500500 (p^d=0.074\hat{p}_d = 0.074 or 7.4%7.4\%). * Observed difference: 0.1240.074=0.050.124 - 0.074 = 0.05 (5%5\% difference). * Step 1 (Naming and Hypotheses): * Test: Two-sample Z-test for the difference in proportions of defective toys. * H0:pn=pdH_0: p_n = p_d (No difference in defect rates). * H_a: p_n > p_d (Night shift defect rate is higher). * Step 2 (Modeling): * Assumption: Center is 00. * Combined P-hat (p^c\hat{p}_c): 62+37500+500=991000=0.099\frac{62 + 37}{500 + 500} = \frac{99}{1000} = 0.099. * Standard Error Combined (SEcSE_c): 0.099(10.099)500+0.099(10.099)500=0.0189\sqrt{\frac{0.099(1-0.099)}{500} + \frac{0.099(1-0.099)}{500}} = 0.0189. * Step 3 (Test Statistic and P-Value): * Z=0.0500.0189=2.6455Z = \frac{0.05 - 0}{0.0189} = 2.6455. * P-Value: P(Z > 2.6455) = 0.0041. * Step 4 (Conclusion): * Since 0.0041 < 0.01 (or 0.050.05), we reject the null hypothesis. * There is significant evidence that the night shift produces a higher proportion of defective toys than the day shift. # Summary of Significance Test Logic * Sampling distributions show all possible differences between two samples that could occur just by chance. * If an observed difference lands in the middle of the distribution, it is considered "not weird" or expected due to sampling variability; thus, we do not reject the null. * If an observed difference lands in the extreme tails (very low P-value), it is "very unlikely" to have happened if the center was actually zero. * The logical conclusion when something very unlikely happens is that the assumption (the null hypothesis) was wrong, leading us to support the alternative hypothesis. * On the AP Exam, for a two-sample Z-test, failing to combine/pool proportions may still result in full credit if work is shown consistently, but using the combined standard error is the curriculum-preferred method.