Inference for Categorical Data: Two-Sample Z-Test for Proportions
Overview of the Two-Sample Z-Test for Proportions * The Two-Sample Z-Test is used to determine if there is a statistically significant difference between two population proportions. * The test focuses on the difference between population proportion one (p1) and population proportion two (p2). * The process involves taking a sample from population one and a sample from population two, then comparing the resulting sample proportions (p^<em>1 and p^2). * The goal is to determine if the observed sample proportions are far enough apart to conclude they represent a real difference in populations, or if the difference is close enough to be attributed to sampling variability. # Step 1: Naming the Test and Stating Hypotheses * Test Name: Two-Sample Z-Test for the Difference of Two Population Proportions. * Null Hypothesis (H0): Assumes there is no difference between the two population proportions (H0:p1=p2). This means the difference between them is exactly zero. * Alternative Hypothesis (Ha): This is based on the specific claim or concern in the problem. There are three options: * Option 1: The proportion from population one is greater than the proportion from population two (Ha:p1>p2). * Option 2: The proportion from population one is less than the proportion from population two (Ha:p1<p2). * Option 3: The two proportions are simply not equal, regardless of which is higher or lower (Ha:p1=p2). # Step 2: Checking Conditions and Building the Sampling Distribution * Conditions: The standard conditions (Random, Independent, and Normal/Large Counts) must be verified for both samples. On the AP Exam, it is common to see scenarios where these conditions are stated to be met. * Mean of the Sampling Distribution: Our model assumes the null hypothesis is true. Therefore, the mean of the sampling distribution of the differences is centered at zero (μ</em>p^1−p^2=0). * Standard Deviation vs. Standard Error: Because the true population proportions are unknown, we cannot use the standard deviation formula. Instead, we must use sample data to calculate the Standard Error. * Standard Error Combined (Pooling): Under the null hypothesis assumption that there is no difference (p1=p2), the two samples are essentially from the same population. To reflect this, researchers combine the data into one single "combined sample proportion" (p^c). * Formula for Combined Sample Proportion (p^c): * p^c=n1+n2Successes1+Successes2 * Alternatively: p^c=n1+n2n1p^1+n2p^2 * Formula for Standard Error Combined (SEc): * SEc=n1p^c(1−p^c)+n2p^c(1−p^c) * Mathematically equivalent version: SEc=p^c(1−p^c)×(n11+n21) * Note on Usage: Combining/pooling data is only performed for two-sample Z-tests, not for confidence intervals or one-sample tests. # Step 3: Finding the Test Statistic and P-Value * Observed Difference: Calculate the difference between the two sample proportions (p^1−p^2). * Test Statistic (Z-score): This informs us how many standard errors the observed difference is from the assumed center of zero. * Z=SEc(Observed Difference)−(Hypothesized Difference) * Standard calculation: Z=SEc(p^1−p^2)−0 * P-Value Calculation: The P-value is the probability of observing a difference as extreme or more extreme than the one calculated, assuming the null hypothesis is true. * Use the normal table or a calculator (NormalCDF) with the calculated Z-score to determine this probability. # Step 4: Making a Conclusion * Comparison: Compare the calculated P-value to the level of significance (typically α=0.05 or α=0.01). * Rejecting the Null: If the P-value is less than the significance level (P-value < α), reject the null hypothesis. This means there is significant evidence to support the alternative hypothesis. * Failing to Reject the Null: If the P-value is greater than the significance level (P-value > α), fail to reject the null hypothesis. This means there is not enough evidence to support the alternative hypothesis. Do not "accept" the null; simply state there is insufficient evidence for the alternative. # Case Study: Toy Company Defect Analysis * Scenario: A CEO is concerned that the proportion of defective toys from the night shift is higher than those from the day shift. * Data Collection: * Sample size for both shifts: 500 toys per shift (nn=500,nd=500). * Night shift defects: 62 out of 500 (p^n=0.124 or 12.4%). * Day shift defects: 37 out of 500 (p^d=0.074 or 7.4%). * Observed difference: 0.124−0.074=0.05 (5% difference). * Step 1 (Naming and Hypotheses): * Test: Two-sample Z-test for the difference in proportions of defective toys. * H0:pn=pd (No difference in defect rates). * H_a: p_n > p_d (Night shift defect rate is higher). * Step 2 (Modeling): * Assumption: Center is 0. * Combined P-hat (p^c): 500+50062+37=100099=0.099. * Standard Error Combined (SEc): 5000.099(1−0.099)+5000.099(1−0.099)=0.0189. * Step 3 (Test Statistic and P-Value): * Z=0.01890.05−0=2.6455. * P-Value: P(Z > 2.6455) = 0.0041. * Step 4 (Conclusion): * Since 0.0041 < 0.01 (or 0.05), we reject the null hypothesis. * There is significant evidence that the night shift produces a higher proportion of defective toys than the day shift. # Summary of Significance Test Logic * Sampling distributions show all possible differences between two samples that could occur just by chance. * If an observed difference lands in the middle of the distribution, it is considered "not weird" or expected due to sampling variability; thus, we do not reject the null. * If an observed difference lands in the extreme tails (very low P-value), it is "very unlikely" to have happened if the center was actually zero. * The logical conclusion when something very unlikely happens is that the assumption (the null hypothesis) was wrong, leading us to support the alternative hypothesis. * On the AP Exam, for a two-sample Z-test, failing to combine/pool proportions may still result in full credit if work is shown consistently, but using the combined standard error is the curriculum-preferred method.