22d ago

Confidence Intervals for Proportions

Introducing Confidence Intervals

  • Confidence intervals provide an upper and lower estimate for predictions, written as (lower, upper).

  • Example: A weatherman's confidence interval is (-40°, 200°). This interval is very likely to be correct but not precise due to its width, indicating a large margin of error.

  • Polls often include a margin of error to account for sampling variation.

  • Sample proportion is a guess for the population proportion.

  • Example: A poll indicates 56% of voters disapprove of President Biden's job performance (p-hat = 0.56). The actual population proportion (p) is likely close to this.

  • Margin of error is added and subtracted from the sample proportion to create a confidence interval.

  • Example: A poll with a 56% disapproval rate and a 2.3% margin of error yields a confidence interval of (53.7%, 58.3%).

  • Even if the true population proportion (e.g., 58% disapproval) is near the interval's edge, the confidence interval still captures the true proportion.

Level of Confidence

  • Unless stated otherwise, assume a 95% confidence interval.

  • Approximately 95% of observations in a Normal model fall within ~2 standard deviations of the mean.

  • A 95% confidence interval uses 1.96 standard deviations (z* = 1.96).

Calculating Confidence Intervals

  • The sampling distribution of proportions has a standard deviation of p(1p)n\sqrt{\frac{p(1-p)}{n}}$$\sqrt{\frac{p(1-p)}{n}}$$.

  • Since 'p' (population proportion) is unknown, we estimate the standard deviation using sample statistics, which we call the standard error.

  • The standard error is calculated as: p^(1p^)n\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$.

  • The 68-95-99.7% Rule: Approximately 68% of samples have p-hats within 1 SE of p, 95% within 2 SEs, and 99.7% within 3 SEs.

  • The formula for a confidence interval is: p^±zp^(1p^)n\hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$\hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$, where p^\hat{p}$$\hat{p}$$ is the sample proportion, and z* is the critical value.

  • Critical values for common confidence levels:

    • 90% confidence interval: z* = 1.645

    • 95% confidence interval: z* = 1.96

    • 99% confidence interval: z* = 2.576

  • Example: Margin of error calculation for a 95% confidence interval with a sample of 1856 people, where 56% disapprove of President Biden:

    • MOE=1.96.56(1.56)1856=0.02258MOE = 1.96 \sqrt{\frac{.56(1-.56)}{1856}} = 0.02258$$MOE = 1.96 \sqrt{\frac{.56(1-.56)}{1856}} = 0.02258$$

    • Confidence Interval: .56±.02258=(.5374,.5826).56 \pm .02258 = (.5374, .5826)$$.56 \pm .02258 = (.5374, .5826)$$

  • Example: Margin of error calculation for a 99% confidence interval with the same sample:

    • MOE=2.576.56(1.56)1856=0.02968MOE = 2.576 \sqrt{\frac{.56(1-.56)}{1856}} = 0.02968$$MOE = 2.576 \sqrt{\frac{.56(1-.56)}{1856}} = 0.02968$$

    • Confidence Interval: .56±.02968=(.5303,.5897).56 \pm .02968 = (.5303, .5897)$$.56 \pm .02968 = (.5303, .5897)$$

  • To decrease the margin of error:

    1. Decrease the confidence level.

    2. Increase the sample size; to halve the standard error/MOE, the sample size must be quadrupled.

Interpreting Confidence Intervals

  • Confidence intervals vary because they are based on sample statistics.

  • Confidence lies in the process of constructing the interval, not in any single interval itself.

  • We expect 95% of all 95% confidence intervals to contain the true population parameter.

  • Interpretation: "If the process were repeated many times, [confidence level]% of samples of this size will produce confidence intervals that capture the true proportion of [context]."

  • For a single confidence interval: "We are [confidence level]% confident that the true proportion of [context] is within [lower%] and [upper%]."

  • Example: Interpreting unemployment rate confidence intervals:

    • If confidence intervals for North Carolina and New Jersey do not overlap, their unemployment rates are considered different.

    • New Jersey: We are 95% confident the true unemployment rate is between 5.6% and 7%.

    • North Carolina: We are 95% confident the true unemployment rate is between 3.2% and 4.3%.

Conditions for Confidence Intervals

  • Assumptions and Conditions:

    • Independence Assumption:

      • Check for Randomization Condition: Data should be sampled randomly or from a randomized experiment.

      • Check the 10% Condition: The sample size should be no more than 10% of the population.

    • Sample Size Assumption:

      • Success/Failure Condition: Expect at least 10 successes and 10 failures.

  • Example: Rotten Tomatoes rating for Sing 2:

    • Conditions:

      1. Independent: Critics are assumed to be randomly chosen and not influencing each other.

      2. 10% Condition: 119 critics are less than 10% of all movie critics.

      3. Success/Failure: 82


knowt logo

Confidence Intervals for Proportions

Introducing Confidence Intervals

  • Confidence intervals provide an upper and lower estimate for predictions, written as (lower, upper).
  • Example: A weatherman's confidence interval is (-40°, 200°). This interval is very likely to be correct but not precise due to its width, indicating a large margin of error.
  • Polls often include a margin of error to account for sampling variation.
  • Sample proportion is a guess for the population proportion.
  • Example: A poll indicates 56% of voters disapprove of President Biden's job performance (p-hat = 0.56). The actual population proportion (p) is likely close to this.
  • Margin of error is added and subtracted from the sample proportion to create a confidence interval.
  • Example: A poll with a 56% disapproval rate and a 2.3% margin of error yields a confidence interval of (53.7%, 58.3%).
  • Even if the true population proportion (e.g., 58% disapproval) is near the interval's edge, the confidence interval still captures the true proportion.

Level of Confidence

  • Unless stated otherwise, assume a 95% confidence interval.
  • Approximately 95% of observations in a Normal model fall within ~2 standard deviations of the mean.
  • A 95% confidence interval uses 1.96 standard deviations (z* = 1.96).

Calculating Confidence Intervals

  • The sampling distribution of proportions has a standard deviation of p(1p)n\sqrt{\frac{p(1-p)}{n}}.
  • Since 'p' (population proportion) is unknown, we estimate the standard deviation using sample statistics, which we call the standard error.
  • The standard error is calculated as: p^(1p^)n\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}.
  • The 68-95-99.7% Rule: Approximately 68% of samples have p-hats within 1 SE of p, 95% within 2 SEs, and 99.7% within 3 SEs.
  • The formula for a confidence interval is: p^±zp^(1p^)n\hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}, where p^\hat{p} is the sample proportion, and z* is the critical value.
  • Critical values for common confidence levels:
    • 90% confidence interval: z* = 1.645
    • 95% confidence interval: z* = 1.96
    • 99% confidence interval: z* = 2.576
  • Example: Margin of error calculation for a 95% confidence interval with a sample of 1856 people, where 56% disapprove of President Biden:
    • MOE=1.96.56(1.56)1856=0.02258MOE = 1.96 \sqrt{\frac{.56(1-.56)}{1856}} = 0.02258
    • Confidence Interval: .56±.02258=(.5374,.5826).56 \pm .02258 = (.5374, .5826)
  • Example: Margin of error calculation for a 99% confidence interval with the same sample:
    • MOE=2.576.56(1.56)1856=0.02968MOE = 2.576 \sqrt{\frac{.56(1-.56)}{1856}} = 0.02968
    • Confidence Interval: .56±.02968=(.5303,.5897).56 \pm .02968 = (.5303, .5897)
  • To decrease the margin of error:
    1. Decrease the confidence level.
    2. Increase the sample size; to halve the standard error/MOE, the sample size must be quadrupled.

Interpreting Confidence Intervals

  • Confidence intervals vary because they are based on sample statistics.
  • Confidence lies in the process of constructing the interval, not in any single interval itself.
  • We expect 95% of all 95% confidence intervals to contain the true population parameter.
  • Interpretation: "If the process were repeated many times, [confidence level]% of samples of this size will produce confidence intervals that capture the true proportion of [context]."
  • For a single confidence interval: "We are [confidence level]% confident that the true proportion of [context] is within [lower%] and [upper%]."
  • Example: Interpreting unemployment rate confidence intervals:
    • If confidence intervals for North Carolina and New Jersey do not overlap, their unemployment rates are considered different.
    • New Jersey: We are 95% confident the true unemployment rate is between 5.6% and 7%.
    • North Carolina: We are 95% confident the true unemployment rate is between 3.2% and 4.3%.

Conditions for Confidence Intervals

  • Assumptions and Conditions:
    • Independence Assumption:
      • Check for Randomization Condition: Data should be sampled randomly or from a randomized experiment.
      • Check the 10% Condition: The sample size should be no more than 10% of the population.
    • Sample Size Assumption:
      • Success/Failure Condition: Expect at least 10 successes and 10 failures.
  • Example: Rotten Tomatoes rating for Sing 2:
    • Conditions:
      1. Independent: Critics are assumed to be randomly chosen and not influencing each other.
      2. 10% Condition: 119 critics are less than 10% of all movie critics.
      3. Success/Failure: 82