Comparing Two Population Parameters: Exhaustive Guide to Proportions and Means
Distinguishing between Independent and Dependent Sampling
Objective 1: Distinguish between independent and dependent sampling.
Definitions: - Independent Sampling: A sampling method is considered independent when an individual selected for one sample does not dictate which individual is to be in a second sample. - Dependent Sampling: A sampling method is dependent when an individual selected to be in one sample is used to determine the individual in the second sample. - Matched-Pairs Samples: This is another term for dependent samples. In some cases, an individual might be matched against him- or herself (e.g., before-and-after measurements).
Examples of Distinguishing Sampling Methods: - Example (a): Hotel Prices: A researcher compares the price of a one-night stay at a Holiday Inn Express versus a Red Roof Inn. She randomly selects 8 towns where the hotel locations are close to each other. - Classification: Dependent. - Reasoning: Once a location is picked for the Holiday Inn Express, the Red Roof Inn must be chosen from the same or a nearby location. - Example (b): Comic Preferences: A polling agency obtained a random sample of 500 young adults (18-39) and 550 senior adults (60-89) and asked if they prefer Marvel or DC Comics. - Classification: Independent. - Reasoning: The selection of young adults in the first sample is not related to the selection of seniors in the second sample.
Hypothesis Testing for Two Population Proportions (Independent Samples)
Objective 2: Test hypotheses regarding two proportions from independent samples.
Sampling Distribution Principles: - To conduct inference, the sampling distribution of the difference of two proportions must be determined. Random assignment often suggests an approximately normal distribution. - Parameters and Statistics: - Suppose a simple random sample of size is taken from population 1 where individuals have a characteristic. - A simple random sample of size is independently taken from population 2 where individuals have the same characteristic. - The sample proportions are defined as and .
Distribution Properties: - The sampling distribution of is approximately normal if requirements are met. - Mean: . - Standard Deviation: .
Standardized Test Statistic (Z-score): - - This follows an approximate standard normal distribution.
Requirements for the Hypothesis Test: - Samples must be independently obtained via simple random sampling or a completely randomized experiment with two levels of treatment. - Success-failure counts: and . - Independence of observations: Sample size must be no more than 5% of the population size ( and ).
Case Study 1: Diabetes Treatments (The ADOPT Study): - Context: Does Avandia (a diabetes drug) increase heart attacks compared to other treatments? - Data points: - Group 1 (Avandia): , , . - Group 2 (Other): , , . - Requirement Verification: - (Check). - (Check). - Hypotheses: - (or ). - H_1: p_1 > p_2 (or H_1: p_1 - p_2 > 0). - Results: ; . - Conclusion: Since 0.1358 > 0.05, we do not reject the null hypothesis. There is no significant evidence of an increased heart attack proportion.
Case Study 2: Unplugging from Devices (Harris Interactive Survey): - Context: Is there a difference in feelings about returning to a time before being "plugged in" between adults (35-54) and baby-boomers (55+)? - Data points: - Group 1 (Adults): , , . - Group 2 (Boomers): , , . - Requirement Verification: - (Check). - (Check). - Hypotheses: - . - . - Results: ; P\text{-value} < 0.0001. - Conclusion: Since the P-value is less than , we reject the null hypothesis.
Confidence Intervals for Two Population Proportions
Objective 3: Construct and interpret confidence intervals for the difference between two population proportions.
Formula for a Confidence Interval: - Lower Bound: - Upper Bound:
Requirement Check: Same as the hypothesis test requirements (, counts , ).
Case Study: Position on Divorce (Harris Interactive Survey): - Groups: Religious (, , ) and Not Religious (, , ). - Interval Calculation (95% Confidence): - Lower Bound: - Upper Bound: - Interpretation: We are 95% confident that the proportion of religious people who find divorce acceptable is between and less than that of non-religious individuals. - Note: Since 0 is not included in the interval and both bounds are negative (p_1 < p_2), we conclude there is a significant difference.
Sample Size Estimation for Proportions
Objective 4: Determine the sample size necessary for estimating the difference between two population proportions.
Calculation Rule: Always round up to the next integer for sample size calculations.
Formula Case 1: Prior Estimates of and are available: -
Formula Case 2: Prior Estimates are unavailable: -
Case Study: Prenatal Care Mothers (15-19 years vs. 30-34 years): - Goal: Estimate difference within 2 percentage points () with 95% confidence. - (a) With Prior Estimates (, ): - Result: Sample size required is . - (b) Without Prior Estimates (Using 0.5): - Result: Sample size required is .
Inference about Two Means: Dependent Samples (Matched-Pairs)
Objective 1: Test hypotheses for a population mean from matched-pairs data.
Requirement Factors: - Sample obtained by simple random sampling or matched-pair design. - Samples are dependent. - Differences are normally distributed with no outliers or . - Sample size .
Test Statistic for Matched Pairs: - - Follows Student's t-distribution with degrees of freedom ( stands for "difference").
Objective 2: Confidence Intervals for Matched-Pairs Data. - Formula: - Point estimate margin of error.
Case Study: Hotel Prices (Hampton Inn vs. La Quinta): - Data collected from Dallas, Tampa Bay, St. Louis, Seattle, San Diego, Chicago, New Orleans, Phoenix, Atlanta, and Orlando (). - Objective: Construct 95% CI for the mean difference.
Case Study: Disney Wait Times: - Comparing "Pirates of the Caribbean" Wait Times to "Tiana's Bayou Adventure" (formerly Splash Mountain). - Data is paired by specific day and time of day (18 observations). - Hypothesis: Wait times for Tiana's are longer than Pirates.
Inference about Two Means: Independent Samples
Objective 1: Test hypotheses regarding the difference of two independent means.
Behrens-Fisher Problem: The case where population variances are unequal and unknown. The solution used is Welch's approximate t.
Welch's t-test Statistic: - - Degrees of Freedom (): Use the smaller of or .
Requirements: - Simple random samples or randomized experiment. - Independent samples. - Populations are normally distributed or and . - Sample size is no more than 5% of the population.
Comparison Caution on Pooling: - Pooled two-sample t-tests are for equal variances. Because equality of variance is hard to verify, Welch's t is always preferred.
Case Study: Exam Paper Color: - Question: Does the color of paper (White vs. Marine Blue) affect results? - Hypotheses: - (or ). - H_1: \mu_1 > \mu_2 (or \mu_1 - \mu_2 > 0). - Results: ; . - Conclusion: Since 0.0175 < 0.05, reject the null hypothesis. There is evidence that scores are higher on white paper.
Objective 2: Confidence Intervals for Independent Means. - Formula: - Example: Salary by Degree: - Engineer (, , ) vs. Psychology (, , ). - 95% CI result: Between $22,972.98 and $29,224.03.
Summary: Which Method to Use?
Step 1: Determine the Parameter: - Proportion (): Use Normal distribution () provided and Sample size of population. - Mean (): Categorize by sampling method.
Step 2: Determine Sampling Method: - Independent Samples (Proportion): Use . - Independent Samples (Mean): Use Student's t with Welch's adjustment , . - Dependent Samples (Mean/Matched-Pairs): Use Student's t with , .
Final Examples for Identification: - Economic System Fair Poll: Comparing 200 Democrats and 160 Republicans (Independent, Proportion). - Pulse Rates: Testing pulse before and after a fright (Dependent, Mean).