Inference for Categorical Data: Two-Sample Z-Test for Proportions

Overview of the Two-Sample Z-Test for Proportions * The Two-Sample Z-Test is used to determine if there is a statistically significant difference between two population proportions. * The test focuses on the difference between population proportion one ( $p_1$ ) and population proportion two ( $p_2$ ). * The process involves taking a sample from population one and a sample from population two, then comparing the resulting sample proportions ( $\hat{p}<em>1$ and $\hat{p}_2$ ). * The goal is to determine if the observed sample proportions are far enough apart to conclude they represent a real difference in populations, or if the difference is close enough to be attributed to sampling variability. # Step 1: Naming the Test and Stating Hypotheses * Test Name: Two-Sample Z-Test for the Difference of Two Population Proportions. * Null Hypothesis ( $H_0$ ): Assumes there is no difference between the two population proportions ( $H_0: p_1 = p_2$ ). This means the difference between them is exactly zero. * Alternative Hypothesis ( $H_a$ ): This is based on the specific claim or concern in the problem. There are three options: * Option 1: The proportion from population one is greater than the proportion from population two ( $H_a: p_1 > p_2$ ). * Option 2: The proportion from population one is less than the proportion from population two ( $H_a: p_1 < p_2$ ). * Option 3: The two proportions are simply not equal, regardless of which is higher or lower ( $H_a: p_1 \neq p_2$ ). # Step 2: Checking Conditions and Building the Sampling Distribution * Conditions: The standard conditions (Random, Independent, and Normal/Large Counts) must be verified for both samples. On the AP Exam, it is common to see scenarios where these conditions are stated to be met. * Mean of the Sampling Distribution: Our model assumes the null hypothesis is true. Therefore, the mean of the sampling distribution of the differences is centered at zero ( $\mu</em>{\hat{p}_1 - \hat{p}_2} = 0$ ). * Standard Deviation vs. Standard Error: Because the true population proportions are unknown, we cannot use the standard deviation formula. Instead, we must use sample data to calculate the Standard Error. * Standard Error Combined (Pooling): Under the null hypothesis assumption that there is no difference ( $p_1 = p_2$ ), the two samples are essentially from the same population. To reflect this, researchers combine the data into one single "combined sample proportion" ( $\hat{p}_c$ ). * Formula for Combined Sample Proportion ( $\hat{p}_c$ ): * $\hat{p}_c = \frac{\text{Successes}_1 + \text{Successes}_2}{n_1 + n_2}$ * Alternatively: $\hat{p}_c = \frac{n_1\hat{p}_1 + n_2\hat{p}_2}{n_1 + n_2}$ * Formula for Standard Error Combined ( $SE_c$ ): * $SE_c = \sqrt{\frac{\hat{p}_c(1 - \hat{p}_c)}{n_1} + \frac{\hat{p}_c(1 - \hat{p}_c)}{n_2}}$ * Mathematically equivalent version: $SE_c = \sqrt{\hat{p}_c(1 - \hat{p}_c) \times (\frac{1}{n_1} + \frac{1}{n_2})}$ * Note on Usage: Combining/pooling data is only performed for two-sample Z-tests, not for confidence intervals or one-sample tests. # Step 3: Finding the Test Statistic and P-Value * Observed Difference: Calculate the difference between the two sample proportions ( $\hat{p}_1 - \hat{p}_2$ ). * Test Statistic (Z-score): This informs us how many standard errors the observed difference is from the assumed center of zero. * $Z = \frac{(\text{Observed Difference}) - (\text{Hypothesized Difference})}{SE_c}$ * Standard calculation: $Z = \frac{(\hat{p}_1 - \hat{p}_2) - 0}{SE_c}$ * P-Value Calculation: The P-value is the probability of observing a difference as extreme or more extreme than the one calculated, assuming the null hypothesis is true. * Use the normal table or a calculator (`NormalCDF`) with the calculated Z-score to determine this probability. # Step 4: Making a Conclusion * Comparison: Compare the calculated P-value to the level of significance (typically $\alpha = 0.05$ or $\alpha = 0.01$ ). * Rejecting the Null: If the P-value is less than the significance level (P-value < $\alpha$ ), reject the null hypothesis. This means there is significant evidence to support the alternative hypothesis. * Failing to Reject the Null: If the P-value is greater than the significance level (P-value > $\alpha$ ), fail to reject the null hypothesis. This means there is not enough evidence to support the alternative hypothesis. Do not "accept" the null; simply state there is insufficient evidence for the alternative. # Case Study: Toy Company Defect Analysis * Scenario: A CEO is concerned that the proportion of defective toys from the night shift is higher than those from the day shift. * Data Collection: * Sample size for both shifts: $500$ toys per shift ( $n_n = 500, n_d = 500$ ). * Night shift defects: $62$ out of $500$ ( $\hat{p}_n = 0.124$ or $12.4\%$ ). * Day shift defects: $37$ out of $500$ ( $\hat{p}_d = 0.074$ or $7.4\%$ ). * Observed difference: $0.124 - 0.074 = 0.05$ ( $5\%$ difference). * Step 1 (Naming and Hypotheses): * Test: Two-sample Z-test for the difference in proportions of defective toys. * $H_0: p_n = p_d$ (No difference in defect rates). * H_a: p_n > p_d (Night shift defect rate is higher). * Step 2 (Modeling): * Assumption: Center is $0$ . * Combined P-hat ( $\hat{p}_c$ ): $\frac{62 + 37}{500 + 500} = \frac{99}{1000} = 0.099$ . * Standard Error Combined ( $SE_c$ ): $\sqrt{\frac{0.099(1-0.099)}{500} + \frac{0.099(1-0.099)}{500}} = 0.0189$ . * Step 3 (Test Statistic and P-Value): * $Z = \frac{0.05 - 0}{0.0189} = 2.6455$ . * P-Value: P(Z > 2.6455) = 0.0041. * Step 4 (Conclusion): * Since 0.0041 < 0.01 (or $0.05$ ), we reject the null hypothesis. * There is significant evidence that the night shift produces a higher proportion of defective toys than the day shift. # Summary of Significance Test Logic * Sampling distributions show all possible differences between two samples that could occur just by chance. * If an observed difference lands in the middle of the distribution, it is considered "not weird" or expected due to sampling variability; thus, we do not reject the null. * If an observed difference lands in the extreme tails (very low P-value), it is "very unlikely" to have happened if the center was actually zero. * The logical conclusion when something very unlikely happens is that the assumption (the null hypothesis) was wrong, leading us to support the alternative hypothesis. * On the AP Exam, for a two-sample Z-test, failing to combine/pool proportions may still result in full credit if work is shown consistently, but using the combined standard error is the curriculum-preferred method.