Study Notes on Estimating a Difference in Proportions

Section 8.3 Estimating a Difference in Proportions

By the end of this section, you should be able to:
- Determine whether the conditions are met for constructing a confidence interval about a difference between two proportions.
- Construct and interpret a confidence interval for a difference between two proportions.

In Section 8.2, confidence intervals were covered for a population proportion $p$.
Statistical questions often involve comparing proportions of successes in two populations. Examples include:
- The difference between Democrats' and Republicans' support for the death penalty.
- Changes in smartphone ownership among teenagers over the past decade.
- Estimating the value of $p1 - p2$, where $p1$ and $p2$ are the proportions of successes in Population 1 and Population 2, respectively.
Additionally, comparing treatment outcomes in experiments, such as effectiveness of medications.

For independent random samples or groups in a randomized experiment (following the Random condition), $p1 - p2$ is our best estimate for $p1 - p2$.
It is essential that the sampling distribution of the statistic $1 - 2$ is approximately Normal.
- This is satisfied when:
- $n1 p1 ext{, } n1(1 - p1) ext{, } n2 p2 ext{, } n2(1 - p2) ext{ are all } ext{ at least } 10$.
As true values of $p1$ and $p2$ are unknown during estimation, sample proportions are used to verify the Large Counts condition.

To construct a confidence interval about a difference in proportions, the following conditions must be met:
1. Random: The data should come from two independent random samples or randomized groups.
2. 10% Condition: If sampling without replacement, ensure that $n1 < 0.10N1$ and $n2 < 0.10N2$.
3. Large Counts: The counts of successes and failures in each sample or group ($n1p1, n1(1 - p1), n2p2, n2(1 - p2)$) should all be at least 10.
The Random condition is crucial for making inferences about the population, as it allows generalization from samples.

A Harris Interactive survey collected data from independent random samples of U.S. (n = 2309) and German (n = 1058) adults on brand name importance when purchasing clothes.
Results:
- U.S. adults favoring brand names: 26%
- German adults favoring brand names: 22%
Let $pu$ be the true proportion of U.S. adults and $pg$ for German adults who think brand names are important.

Checking Conditions:
- Random: The data is from independent random samples (2309 U.S. adults, 1058 German adults). ✓
- 10% Condition: $2309<10 ext{% of all U.S. adults}$ and $1058<10 ext{% of all German adults}$ ✓
- Large Counts:
  - $n1 pu = 2309(0.26) = 600.34
    ightarrow 600$
  - $n1(1 - pu) = 2309(0.74) = 1708.66
    ightarrow 1709$
  - $n2 pg = 1058(0.22) = 232.76
    ightarrow 233$
  - $n2(1 - pg) = 1058(0.78) = 825.248
    ightarrow 825$
  - All values are > 10 ✓
Calculating Confidence Interval:
- Formula: statistic ± (critical value) × (standard error of statistic)
- Standard Error of Difference:
  SE{p1 - p2} = ext{ }rac{ ext{sqrt}(p1(1-p1)/n1 + p2(1-p2)/n_2)}
- The standard deviation of the sampling distribution when conditions are met is given by:
  rac{p1(1-p1)}{n1} + rac{p2(1-p2)}{n2}
Confidence Interval: Given critical value for 95% is $z^* ext{ }= 1.96$:
CI:
(0.26 - 0.22) ± 1.96 imes SE{p1 - p_2}

Resulting Confidence Interval:
- Calculation yields $(0.01, 0.07)$
- Interpretation: We are 95% confident that the difference in true proportions of brand name importance between U.S. and German adults ranges from 1% to 7%, suggesting that U.S. pride in brand names is significantly higher.

Important Note: The confidence interval does not include 0, indicating that there is a significant difference in proportions.
Misinterpretation Clarification: It's incorrect to state that the importance of brand names for U.S. individuals is definitively higher than German individuals without a proper context regarding the actual proportion values.

Data produced from randomized comparative experiments also employs the same methodology for estimating differences in proportions, with nuanced adjustments for the definitions of parameters and checking conditions.
Example involving prostate cancer treatment: 731 men assigned to surgery versus observation was mentioned. True proportions $ps$ and $po$ were defined as follows:
- $p_s$: true survival proportion for surgery group.
- $p_o$: true survival proportion for observation group.
Important to differentiate between sample-related metrics and the true population parameters during reporting.

Conditions must include:
1. Randomness: Independence in data collection.
2. 10% Rule: Adjustments for sampling without replacement when applicable.
3. Large Counts Rule: Counts of success and failures must meet thresholds.
Confidence intervals provide insights into whether statistical claims (like no difference) are plausible.