AP Statistics Unit 6 (Proportions): Learning to Estimate a Population Proportion with Confidence
Introduction to Confidence Intervals
What a confidence interval is (and what it isn’t)
In statistics, you usually want to learn about a population parameter—a fixed but typically unknown value that describes a whole population. For this section, the parameter is the population proportion, written as p, meaning “the true proportion of individuals in the population with some characteristic of interest.”
Because you rarely can measure an entire population, you take a random sample and compute a statistic (a number from the sample) to estimate the parameter. For proportions, the statistic is the sample proportion:
\hat{p} = \frac{x}{n}
Here, x is the number of “successes” (individuals in the sample with the characteristic) and n is the sample size.
A single number like \hat{p} is a point estimate. But point estimates wiggle from sample to sample due to random sampling variability. A confidence interval (CI) is designed to capture that uncertainty by giving an interval of plausible values for the parameter.
A confidence interval for a parameter is an interval computed from sample data that is intended to capture the true parameter value with a stated long-run success rate (the confidence level), under repeated random sampling.
It’s crucial to understand what the confidence level means:
- A 90% confidence method will produce intervals that contain the true parameter about 90% of the time in the long run, if you repeatedly take random samples the same way and build intervals the same way.
- Once you compute a specific interval from your one sample, the interval either contains p or it doesn’t. There is no probability attached to p “being in the interval” after the interval is made (in the usual AP Statistics interpretation).
So you should avoid the common incorrect phrasing “There is a 95% chance that p is in this interval.” The correct idea is about the procedure’s long-run performance, not a probability statement about a fixed parameter.
Why confidence intervals matter
Confidence intervals are one of the main tools of statistical inference—using sample data to draw conclusions about a population. They matter because:
- They quantify uncertainty. Two studies might have the same \hat{p} but different sample sizes; a confidence interval will show which estimate is more precise.
- They help you make practical decisions. For example, if a CI for the proportion of defective parts is well below a tolerance threshold, you can be more comfortable with a manufacturing process.
- They connect to hypothesis tests. For many AP Stats contexts, a CI corresponds to a two-sided test at a related significance level (conceptually, values outside the interval would be rejected by the test).
The big picture: sampling distributions and “standard error”
Confidence intervals are built using the idea of a sampling distribution: if you repeatedly sample and compute \hat{p} each time, those \hat{p} values form a distribution.
For a large enough sample, \hat{p} is approximately Normal with:
- Center at p
- Standard deviation approximately \sqrt{\frac{p(1-p)}{n}}
Because p is unknown, we estimate that standard deviation using the sample proportion. The estimated standard deviation of \hat{p} is called the standard error:
SE_{\hat{p}} = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}
A confidence interval is basically:
\text{estimate} \pm \text{critical value} \times \text{standard error}
For proportions, the estimate is \hat{p}. The critical value comes from the standard Normal distribution (a z value) for the chosen confidence level.
Example (conceptual, before computing)
Suppose you sample voters and find \hat{p} = 0.56 support a candidate. If you used a small sample, your CI might be wide (like 0.46 to 0.66). With a larger sample, your CI might be narrow (like 0.53 to 0.59). Both are centered near 0.56, but they communicate very different levels of certainty.
Exam Focus
- Typical question patterns:
- Explain what “95% confidence” means in context.
- Identify p and \hat{p} from a scenario and describe what a CI is estimating.
- Compare two CIs (different widths) and interpret which estimate is more precise.
- Common mistakes:
- Saying “95% probability p is in the interval” instead of the long-run method interpretation.
- Confusing the parameter p (population) with the statistic \hat{p} (sample).
- Treating a wider interval as “more accurate”; it’s usually less precise, though it may come from a higher confidence level.
Constructing a Confidence Interval for a Population Proportion
When you use a one-proportion z interval
A one-proportion z interval is used when:
- You have one sample from a population (or one randomized experiment group if you’re just estimating a single proportion), and
- The variable is categorical with two outcomes (success/failure), and
- You want to estimate the population proportion p.
Examples:
- Proportion of students at a school who approve of a policy.
- Proportion of components that are defective.
- Proportion of adults who favor a certain law.
Conditions (the logic behind them)
AP Statistics emphasizes checking conditions before doing inference. These conditions justify the Normal approximation and ensure the sampling method supports generalization.
1) Random condition
You need data from a random sample (e.g., SRS) or a random assignment mechanism (in an experiment). For a CI about a population proportion, it’s usually a random sample.
Why it matters: Without random sampling, the inference might be biased—your CI could systematically miss p.
2) Independence condition (10% condition)
If sampling without replacement from a finite population, a common guideline is:
- n should be no more than 10% of the population size.
Why it matters: This makes observations approximately independent; without it, the variability calculations can be off.
3) Large counts condition (Normal approximation)
For the one-proportion z interval, you typically check that the numbers of successes and failures in the sample are large enough:
- n\hat{p} \ge 10 and n(1-\hat{p}) \ge 10
Why it matters: These conditions help ensure the sampling distribution of \hat{p} is approximately Normal.
A common misconception is to check with p (unknown). For confidence intervals in AP Statistics, you typically use \hat{p} in the large counts check.
The confidence interval formula (one-proportion z interval)
Once conditions are met, a CI for p is:
\hat{p} \pm z^*\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}
Where:
- \hat{p} is the sample proportion.
- z^* is the critical value from the standard Normal distribution matching the confidence level (for a two-sided interval).
- n is the sample size.
Common z^* values you’ll often use:
| Confidence level | Critical value z^* (approx.) |
|---|---|
| 90% | 1.645 |
| 95% | 1.96 |
| 99% | 2.576 |
How to build the interval step by step (the process)
A reliable way to construct and communicate a CI is to follow a consistent structure:
1) Define the parameter in context.
- Example: p = the true proportion of all registered voters in the state who support the candidate.
2) Check conditions (random, independence/10%, large counts).
3) Calculate the statistic \hat{p} = x/n.
4) Choose z^* based on the confidence level.
5) Compute the standard error using \hat{p}.
6) Compute the margin of error:
ME = z^*\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}
7) Write the interval as \hat{p} \pm ME or as lower/upper bounds.
8) Interpret the interval in context (this is not optional on AP problems).
Worked Example 1: Constructing and interpreting a 95% CI
A polling organization takes an SRS of 500 adults and finds that 270 support a new environmental policy.
1) Parameter
Let p be the true proportion of all adults in the population who support the policy.
2) Conditions
- Random: The sample is an SRS (given).
- Independence: It’s reasonable that 500 is less than 10% of all adults in the population.
- Large counts:
- n\hat{p} = 500\times(270/500)=270 \ge 10
- n(1-\hat{p}) = 500\times(230/500)=230 \ge 10
3) Compute \hat{p}
\hat{p} = \frac{270}{500} = 0.54
4) Use z^* for 95% confidence
z^* = 1.96
5) Standard error
SE = \sqrt{\frac{0.54(1-0.54)}{500}} = \sqrt{\frac{0.54\times0.46}{500}}
Compute the inside:
\frac{0.2484}{500} = 0.0004968
So:
SE = \sqrt{0.0004968} \approx 0.0223
6) Margin of error
ME = 1.96(0.0223) \approx 0.0437
7) Confidence interval
0.54 \pm 0.0437
Lower bound: 0.4963
Upper bound: 0.5837
8) Interpretation (correct AP-style wording)
We are 95% confident that the true proportion of all adults in the population who support the environmental policy is between about 0.496 and 0.584.
Notice what’s being estimated: the population proportion, not the sample.
Worked Example 2: How confidence level changes the width
Using the same data as above (\hat{p}=0.54, n=500), suppose you want a 99% CI.
For 99% confidence, z^* \approx 2.576.
The standard error stays the same (it depends on the data and n), so only the margin of error changes:
ME_{99} = 2.576(0.0223) \approx 0.0574
So the 99% CI would be:
0.54 \pm 0.0574
This interval is wider than the 95% CI. That’s the general tradeoff:
- Higher confidence level \Rightarrow larger z^* \Rightarrow larger margin of error \Rightarrow wider interval.
Common pitfalls while constructing proportion intervals
- Forgetting to check conditions: AP questions often award points for stating/checking them.
- Using a t critical value: For one proportion CIs in AP Stats, you use z^* (the t distribution is for means when \sigma is unknown).
- Mismatching n and x: Double-check that \hat{p} is computed from the same sample size you use in the standard error.
- Reporting the interval without context: You should name the population and the characteristic clearly.
Exam Focus
- Typical question patterns:
- “Construct and interpret a 95% confidence interval for p.” (Often graded on parameter, conditions, calculation, interpretation.)
- “Do the conditions for inference hold?” followed by “If so, compute the interval.”
- Compare intervals from two different samples (which is narrower, which indicates more precision?).
- Common mistakes:
- Using the wrong large counts check (or skipping it) and proceeding when counts are too small.
- Confusing standard deviation of a population with standard error of \hat{p}.
- Interpreting “confidence” as certainty about the sample instead of uncertainty about the population.
Interpreting Confidence Intervals and Determining Sample Size
How to interpret a confidence interval well
A strong AP Statistics interpretation has three parts:
1) The confidence level (e.g., 95%)
2) The parameter p stated in context
3) The interval of plausible values
A template that works well:
“We are [confidence level] confident that the true proportion of [population] who [have characteristic] is between [lower bound] and [upper bound].”
That phrasing keeps you focused on the population parameter, not the sample statistic.
Interpreting the margin of error (precision)
The margin of error is the “plus/minus” part:
ME = z^*\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}
Conceptually, the margin of error is how far your interval extends from \hat{p} on either side. Smaller margin of error means a more precise estimate.
Three levers affect ME:
- Confidence level: bigger z^* makes ME bigger.
- Sample size: bigger n makes ME smaller (but with diminishing returns because of the square root).
- Estimated variability: \hat{p}(1-\hat{p}) is largest near 0.5 and smaller near 0 or 1, so proportions near 0.5 tend to have larger standard errors (for fixed n).
A useful intuition: estimating a “coin near fair” (true proportion near 0.5) is harder than estimating a “coin that almost always lands heads” (proportion near 0.95), given the same number of flips.
What “95% confident” really means (procedure language)
If you want to be extra precise (and AP readers like this), you can interpret confidence in terms of the method:
“If we were to take many random samples of size n from this population and build a 95% confidence interval from each sample using this method, about 95% of those intervals would contain the true population proportion p.”
You typically wouldn’t write this every time, but understanding it prevents the biggest misconceptions.
When a confidence interval gives evidence about a claim
Confidence intervals can be used to assess the plausibility of particular values of p.
Example idea: If someone claims p = 0.50, and your 95% CI is (0.52, 0.60), then 0.50 is not in the interval—so 0.50 is not a plausible value given the data at the 95% confidence level (this connects to a two-sided hypothesis test at a related significance level).
Be careful: This is not the same as proving the claim false with certainty. It’s evidence based on sample data.
Determining sample size for a desired margin of error
Sometimes you plan a study and want a confidence interval with a target precision. In that case, you choose n so that the margin of error is at most some value m.
Start from the margin of error formula and solve for n. For planning, you use an anticipated value of p (call it p^*) because \hat{p} isn’t available yet.
The sample size formula for a proportion (planning for a CI) is:
n = \left(\frac{z^*}{m}\right)^2 p^*(1-p^*)
Where:
- z^* is based on your confidence level.
- m is the desired margin of error (as a proportion, not a percent).
- p^* is your best guess for the true proportion.
After calculating, you round up to the next whole number because sample size must be an integer and rounding down could produce a margin of error larger than desired.
What if you don’t have a prior estimate of p?
If you have no reasonable prior guess, a conservative approach is to use:
p^* = 0.5
because p(1-p) is maximized at 0.5, producing the largest required n. This guarantees your sample size is large enough for the desired margin of error no matter what the true proportion is.
Worked Example 3: Finding required sample size
You want to estimate the proportion of students at a large university who would support a new campus transportation fee. You want a 95% confidence interval with margin of error at most 0.03.
You don’t have prior information, so use p^* = 0.5. For 95% confidence, z^* = 1.96, and m = 0.03.
Use:
n = \left(\frac{1.96}{0.03}\right)^2 (0.5)(0.5)
Compute step by step:
\left(\frac{1.96}{0.03}\right) \approx 65.3333
\left(65.3333\right)^2 \approx 4268.4444
Multiply by 0.25:
n \approx 1067.1111
Round up:
n = 1068
Interpretation: You should sample at least 1068 students (randomly) to get a 95% CI with margin of error no more than 0.03.
Worked Example 4: Using a prior estimate to reduce required n
Suppose earlier surveys suggest about 20% of students support the fee, so use p^* = 0.20.
n = \left(\frac{1.96}{0.03}\right)^2 (0.20)(0.80)
We already computed \left(\frac{1.96}{0.03}\right)^2 \approx 4268.4444.
Now multiply by 0.16:
n \approx 4268.4444(0.16) \approx 682.9511
Round up:
n = 683
Using a realistic prior estimate can reduce the needed sample size—because you’re planning for less variability than the worst-case 0.5.
Common interpretation mistakes (and how to fix them)
- Talking about individuals instead of the parameter: The CI is about a proportion in the population, not about whether a specific person supports something.
- Interpreting confidence as success for this one interval: Your computed interval is not “95% likely to be right”; rather, the method is right 95% of the time in repeated sampling.
- Forgetting the population: A sample of “students in Stats class” can’t justify an inference about “all students at the university.” Your interpretation should match the actual sampling frame.
Notation and vocabulary reminders
Because AP questions expect correct statistical language, it helps to keep symbols straight:
| Concept | Symbol | Meaning |
|---|---|---|
| Population proportion | p | True proportion in the population |
| Sample proportion | \hat{p} | Proportion observed in the sample |
| Sample size | n | Number of individuals in the sample |
| Success count | x | Number with the characteristic |
| Standard error of \hat{p} | \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} | Estimated SD of \hat{p} |
| Critical value | z^* | From Normal distribution for confidence level |
| Margin of error | ME | Half-width of the CI |
Exam Focus
- Typical question patterns:
- Write an interpretation of a CI “in context” (often worth a full rubric point).
- Determine the minimum sample size for a given confidence level and margin of error.
- Explain how changing confidence level or sample size affects interval width.
- Common mistakes:
- Not rounding sample size up, or treating the computed n as flexible.
- Using \hat{p} in sample size planning even though no sample has been taken (you should use p^* or 0.5).
- Saying that a higher confidence level makes the interval “more accurate” without acknowledging it becomes wider (less precise).