Confidence intervals provide an upper and lower estimate for predictions, written as (lower, upper).
Example: A weatherman's confidence interval is (-40°, 200°). This interval is very likely to be correct but not precise due to its width, indicating a large margin of error.
Polls often include a margin of error to account for sampling variation.
Sample proportion is a guess for the population proportion.
Example: A poll indicates 56% of voters disapprove of President Biden's job performance (p-hat = 0.56). The actual population proportion (p) is likely close to this.
Margin of error is added and subtracted from the sample proportion to create a confidence interval.
Example: A poll with a 56% disapproval rate and a 2.3% margin of error yields a confidence interval of (53.7%, 58.3%).
Even if the true population proportion (e.g., 58% disapproval) is near the interval's edge, the confidence interval still captures the true proportion.
Level of Confidence
Unless stated otherwise, assume a 95% confidence interval.
Approximately 95% of observations in a Normal model fall within ~2 standard deviations of the mean.
A 95% confidence interval uses 1.96 standard deviations (z* = 1.96).
Calculating Confidence Intervals
The sampling distribution of proportions has a standard deviation of \sqrt{\frac{p(1-p)}{n}}.
Since 'p' (population proportion) is unknown, we estimate the standard deviation using sample statistics, which we call the standard error.
The standard error is calculated as: \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}.
The 68-95-99.7% Rule: Approximately 68% of samples have p-hats within 1 SE of p, 95% within 2 SEs, and 99.7% within 3 SEs.
The formula for a confidence interval is: \hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}, where \hat{p} is the sample proportion, and z* is the critical value.
Critical values for common confidence levels:
90% confidence interval: z* = 1.645
95% confidence interval: z* = 1.96
99% confidence interval: z* = 2.576
Example: Margin of error calculation for a 95% confidence interval with a sample of 1856 people, where 56% disapprove of President Biden:
Increase the sample size; to halve the standard error/MOE, the sample size must be quadrupled.
Interpreting Confidence Intervals
Confidence intervals vary because they are based on sample statistics.
Confidence lies in the process of constructing the interval, not in any single interval itself.
We expect 95% of all 95% confidence intervals to contain the true population parameter.
Interpretation: "If the process were repeated many times, [confidence level]% of samples of this size will produce confidence intervals that capture the true proportion of [context]."
For a single confidence interval: "We are [confidence level]% confident that the true proportion of [context] is within [lower%] and [upper%]."