Study Notes on Estimating a Difference in Proportions
Section 8.3 Estimating a Difference in Proportions
Learning Targets
- By the end of this section, you should be able to:
- Determine whether the conditions are met for constructing a confidence interval about a difference between two proportions.
- Construct and interpret a confidence interval for a difference between two proportions.
Introduction
- In Section 8.2, confidence intervals were covered for a population proportion $p$.
- Statistical questions often involve comparing proportions of successes in two populations. Examples include:
- The difference between Democrats' and Republicans' support for the death penalty.
- Changes in smartphone ownership among teenagers over the past decade.
- Estimating the value of $p1 - p2$, where $p1$ and $p2$ are the proportions of successes in Population 1 and Population 2, respectively.
- Additionally, comparing treatment outcomes in experiments, such as effectiveness of medications.
Confidence Intervals for $p1 - p2$
- For independent random samples or groups in a randomized experiment (following the Random condition), $p1 - p2$ is our best estimate for $p1 - p2$.
- It is essential that the sampling distribution of the statistic $1 - 2$ is approximately Normal.
- This is satisfied when:
- $n1 p1 ext{, } n1(1 - p1) ext{, } n2 p2 ext{, } n2(1 - p2) ext{ are all } ext{ at least } 10$.
- As true values of $p1$ and $p2$ are unknown during estimation, sample proportions are used to verify the Large Counts condition.
Conditions for Constructing a Confidence Interval
- To construct a confidence interval about a difference in proportions, the following conditions must be met:
- Random: The data should come from two independent random samples or randomized groups.
- 10% Condition: If sampling without replacement, ensure that $n1 < 0.10N1$ and $n2 < 0.10N2$.
- Large Counts: The counts of successes and failures in each sample or group ($n1p1, n1(1 - p1), n2p2, n2(1 - p2)$) should all be at least 10.
- The Random condition is crucial for making inferences about the population, as it allows generalization from samples.
Example: Preference for Brand Names
Problem Statement
- A Harris Interactive survey collected data from independent random samples of U.S. (n = 2309) and German (n = 1058) adults on brand name importance when purchasing clothes.
- Results:
- U.S. adults favoring brand names: 26%
- German adults favoring brand names: 22%
- Let $pu$ be the true proportion of U.S. adults and $pg$ for German adults who think brand names are important.
Solution Steps
Checking Conditions:
- Random: The data is from independent random samples (2309 U.S. adults, 1058 German adults). ✓
- 10% Condition: $2309<10 ext{% of all U.S. adults}$ and $1058<10 ext{% of all German adults}$ ✓
- Large Counts:
- $n1 pu = 2309(0.26) = 600.34
ightarrow 600$ - $n1(1 - pu) = 2309(0.74) = 1708.66
ightarrow 1709$ - $n2 pg = 1058(0.22) = 232.76
ightarrow 233$ - $n2(1 - pg) = 1058(0.78) = 825.248
ightarrow 825$ - All values are > 10 ✓
- $n1 pu = 2309(0.26) = 600.34
Calculating Confidence Interval:
Formula: statistic ± (critical value) × (standard error of statistic)
Standard Error of Difference:
SE{p1 - p2} = ext{ }rac{ ext{sqrt}(p1(1-p1)/n1 + p2(1-p2)/n_2)}
The standard deviation of the sampling distribution when conditions are met is given by:
rac{p1(1-p1)}{n1} + rac{p2(1-p2)}{n2}
Confidence Interval: Given critical value for 95% is $z^* ext{ }= 1.96$:
CI:
(0.26 - 0.22) ± 1.96 imes SE{p1 - p_2}
- Resulting Confidence Interval:
- Calculation yields $(0.01, 0.07)$
- Interpretation: We are 95% confident that the difference in true proportions of brand name importance between U.S. and German adults ranges from 1% to 7%, suggesting that U.S. pride in brand names is significantly higher.
Conclusions from Example
- Important Note: The confidence interval does not include 0, indicating that there is a significant difference in proportions.
- Misinterpretation Clarification: It's incorrect to state that the importance of brand names for U.S. individuals is definitively higher than German individuals without a proper context regarding the actual proportion values.
Application to Experimental Data
- Data produced from randomized comparative experiments also employs the same methodology for estimating differences in proportions, with nuanced adjustments for the definitions of parameters and checking conditions.
- Example involving prostate cancer treatment: 731 men assigned to surgery versus observation was mentioned. True proportions $ps$ and $po$ were defined as follows:
- $p_s$: true survival proportion for surgery group.
- $p_o$: true survival proportion for observation group.
- Important to differentiate between sample-related metrics and the true population parameters during reporting.
Summary of Conditions for Confidence Intervals
- Conditions must include:
- Randomness: Independence in data collection.
- 10% Rule: Adjustments for sampling without replacement when applicable.
- Large Counts Rule: Counts of success and failures must meet thresholds.
- Confidence intervals provide insights into whether statistical claims (like no difference) are plausible.