Study Notes on Estimating a Difference in Proportions

Section 8.3 Estimating a Difference in Proportions

Learning Targets

  • By the end of this section, you should be able to:
    • Determine whether the conditions are met for constructing a confidence interval about a difference between two proportions.
    • Construct and interpret a confidence interval for a difference between two proportions.

Introduction

  • In Section 8.2, confidence intervals were covered for a population proportion $p$.
  • Statistical questions often involve comparing proportions of successes in two populations. Examples include:
    • The difference between Democrats' and Republicans' support for the death penalty.
    • Changes in smartphone ownership among teenagers over the past decade.
    • Estimating the value of $p1 - p2$, where $p1$ and $p2$ are the proportions of successes in Population 1 and Population 2, respectively.
  • Additionally, comparing treatment outcomes in experiments, such as effectiveness of medications.

Confidence Intervals for $p1 - p2$

  • For independent random samples or groups in a randomized experiment (following the Random condition), $p1 - p2$ is our best estimate for $p1 - p2$.
  • It is essential that the sampling distribution of the statistic $1 - 2$ is approximately Normal.
    • This is satisfied when:
    • $n1 p1 ext{, } n1(1 - p1) ext{, } n2 p2 ext{, } n2(1 - p2) ext{ are all } ext{ at least } 10$.
  • As true values of $p1$ and $p2$ are unknown during estimation, sample proportions are used to verify the Large Counts condition.

Conditions for Constructing a Confidence Interval

  • To construct a confidence interval about a difference in proportions, the following conditions must be met:
    1. Random: The data should come from two independent random samples or randomized groups.
    2. 10% Condition: If sampling without replacement, ensure that $n1 < 0.10N1$ and $n2 < 0.10N2$.
    3. Large Counts: The counts of successes and failures in each sample or group ($n1p1, n1(1 - p1), n2p2, n2(1 - p2)$) should all be at least 10.
  • The Random condition is crucial for making inferences about the population, as it allows generalization from samples.

Example: Preference for Brand Names

Problem Statement

  • A Harris Interactive survey collected data from independent random samples of U.S. (n = 2309) and German (n = 1058) adults on brand name importance when purchasing clothes.
  • Results:
    • U.S. adults favoring brand names: 26%
    • German adults favoring brand names: 22%
  • Let $pu$ be the true proportion of U.S. adults and $pg$ for German adults who think brand names are important.

Solution Steps

  1. Checking Conditions:

    • Random: The data is from independent random samples (2309 U.S. adults, 1058 German adults). ✓
    • 10% Condition: $2309<10 ext{% of all U.S. adults}$ and $1058<10 ext{% of all German adults}$ ✓
    • Large Counts:
      • $n1 pu = 2309(0.26) = 600.34
        ightarrow 600$
      • $n1(1 - pu) = 2309(0.74) = 1708.66
        ightarrow 1709$
      • $n2 pg = 1058(0.22) = 232.76
        ightarrow 233$
      • $n2(1 - pg) = 1058(0.78) = 825.248
        ightarrow 825$
      • All values are > 10 ✓
  2. Calculating Confidence Interval:

    • Formula: statistic ± (critical value) × (standard error of statistic)

    • Standard Error of Difference:

      SE{p1 - p2} = ext{ } rac{ ext{sqrt}(p1(1-p1)/n1 + p2(1-p2)/n_2)}

    • The standard deviation of the sampling distribution when conditions are met is given by:

      rac{p1(1-p1)}{n1} + rac{p2(1-p2)}{n2}

  3. Confidence Interval: Given critical value for 95% is $z^* ext{ }= 1.96$:

    CI:
    (0.26 - 0.22) ± 1.96 imes SE{p1 - p_2}

  • Resulting Confidence Interval:
    • Calculation yields $(0.01, 0.07)$
    • Interpretation: We are 95% confident that the difference in true proportions of brand name importance between U.S. and German adults ranges from 1% to 7%, suggesting that U.S. pride in brand names is significantly higher.

Conclusions from Example

  • Important Note: The confidence interval does not include 0, indicating that there is a significant difference in proportions.
  • Misinterpretation Clarification: It's incorrect to state that the importance of brand names for U.S. individuals is definitively higher than German individuals without a proper context regarding the actual proportion values.

Application to Experimental Data

  • Data produced from randomized comparative experiments also employs the same methodology for estimating differences in proportions, with nuanced adjustments for the definitions of parameters and checking conditions.
  • Example involving prostate cancer treatment: 731 men assigned to surgery versus observation was mentioned. True proportions $ps$ and $po$ were defined as follows:
    • $p_s$: true survival proportion for surgery group.
    • $p_o$: true survival proportion for observation group.
  • Important to differentiate between sample-related metrics and the true population parameters during reporting.

Summary of Conditions for Confidence Intervals

  • Conditions must include:
    1. Randomness: Independence in data collection.
    2. 10% Rule: Adjustments for sampling without replacement when applicable.
    3. Large Counts Rule: Counts of success and failures must meet thresholds.
  • Confidence intervals provide insights into whether statistical claims (like no difference) are plausible.