Unit 5 Notes: Understanding Sampling Distributions (Proportions and Means)
Sampling Distribution of a Sample Proportion
What a sampling distribution is (and why you care)
When you take a sample, you usually compute some statistic (a number calculated from the sample) like a sample proportion or a sample mean. If you took a different random sample of the same size from the same population, you would almost certainly get a different statistic.
A sampling distribution describes this idea formally: it is the distribution of a statistic over all possible random samples of a fixed size from the population. You almost never can list “all possible samples” in real life—but thinking this way is powerful because it lets you predict how much your statistic tends to vary.
This matters because inference (confidence intervals and significance tests) is built on one key question:
- If the population parameter were really p (or \mu), how likely is it that random sampling would produce the statistic you observed?
To answer that, you need the sampling distribution.
The statistic: sample proportion
The sample proportion is the fraction of sampled individuals who have a certain characteristic (“success”). Notation:
- Parameter (fixed, population): p = true population proportion
- Statistic (random, sample): \hat{p} = sample proportion
If you take a sample of size n and count successes X, then:
\hat{p} = \frac{X}{n}
The value of \hat{p} changes from sample to sample, so \hat{p} has a sampling distribution.
How the sampling distribution of \hat{p} behaves
Under random sampling, the sampling distribution of \hat{p} has a predictable center and spread.
Center (mean)
The sampling distribution is centered at the true population proportion:
\mu_{\hat{p}} = p
This is why \hat{p} is called an **unbiased estimator** of p: over many random samples, it does not systematically overestimate or underestimate the true proportion.
Spread (standard deviation)
The standard deviation of the sampling distribution of \hat{p} is:
\sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}}
Interpretation:
- Larger n makes \sigma_{\hat{p}} smaller, so sample proportions cluster more tightly around p.
- Proportions near 0.5 have the largest variability because p(1-p) is largest near 0.5.
In practice, you usually don’t know p, so when you estimate the spread from data you use a related idea called the **standard error** (often using \hat{p} in place of p). But for probability questions about the sampling distribution (when p is given), you use \sigma_{\hat{p}}.
Shape: when is \hat{p} approximately Normal?
The distribution of \hat{p} becomes approximately Normal when the sample is large enough that you expect at least 10 successes and 10 failures (using the population proportion p):
- np \ge 10
- n(1-p) \ge 10
This is commonly called the Large Counts condition. When it holds, you can use Normal probability calculations with:
\hat{p} \approx N\left(p, \sqrt{\frac{p(1-p)}{n}}\right)
This approximation is closely connected to the Central Limit Theorem (CLT), but AP Statistics often treats proportions with this specific “large counts” rule.
Conditions you must check (why they exist)
Sampling distribution results rely on how the sample was obtained.
Random condition: The data should come from a random sample or randomized experiment. Without randomness, the probability model for the statistic is not trustworthy.
Independence condition: Outcomes should be (approximately) independent.
- When sampling without replacement from a finite population, independence is approximately true if the sample is not too large.
- AP Statistics uses the 10% condition: n \le 0.1N, where N is the population size.
Large Counts condition (for Normal approximation): np \ge 10 and n(1-p) \ge 10.
A common misconception is thinking that “random” automatically means “independent.” Random sampling without replacement creates dependence when the sample is a big fraction of the population, which is exactly why the 10% condition matters.
Worked example: probability involving \hat{p}
A manufacturer knows that the true defect rate is p = 0.08. A quality engineer randomly samples n = 200 items. What is the probability the sample defect proportion is more than 0.12?
Step 1: Check conditions
- Random: assume yes (random sample).
- 10%: if the population is large enough that 200 \le 0.1N, independence is reasonable.
- Large Counts:
- np = 200(0.08) = 16
- n(1-p) = 200(0.92) = 184
Both are at least 10, so Normal approximation is reasonable.
Step 2: Find mean and standard deviation
\mu_{\hat{p}} = 0.08
\sigma_{\hat{p}} = \sqrt{\frac{0.08(0.92)}{200}}
Compute:
\sigma_{\hat{p}} = \sqrt{\frac{0.0736}{200}} = \sqrt{0.000368} \approx 0.0192
Step 3: Convert to a z-score and use Normal probability
z = \frac{0.12 - 0.08}{0.0192} \approx 2.08
So:
P(\hat{p} > 0.12) \approx P(Z > 2.08)
Using standard Normal probabilities, this is about 0.019.
What this means: Even with a true defect rate of 8%, you will occasionally see a sample as high as 12% just due to random sampling variability—but it’s fairly rare (about 2%).
Notation and meaning: parameter vs statistic (quick reference)
| Concept | Population (parameter) | Sample (statistic) | Sampling distribution center | Sampling distribution spread |
|---|---|---|---|---|
| Proportion | p | \hat{p} | p | \sqrt{\frac{p(1-p)}{n}} |
Exam Focus
- Typical question patterns:
- “Assuming p is ___ and n is ___, find P(\hat{p} > \text{value})” (or between two values).
- “Verify conditions for using a Normal model for \hat{p}, then compute a probability.”
- “Explain how increasing n changes the sampling distribution of \hat{p}.”
- Common mistakes:
- Using the Large Counts check with \hat{p} when the problem gives p for a probability calculation (for this type of question, use p).
- Forgetting the 10% condition and treating sampling without replacement as independent when n is a large fraction of N.
- Confusing the distribution of the data (0/1 outcomes) with the distribution of \hat{p} (a proportion that can take many values).
Sampling Distribution of a Sample Mean
What changes when you move from proportions to means
A proportion is built from yes/no outcomes. A mean is built from numerical measurements (heights, waiting times, scores). The central idea is the same: if you repeatedly take random samples of size n and compute the sample mean each time, those means form a distribution.
- Parameter: \mu = population mean
- Statistic: \bar{x} = sample mean
If your sample is x_1, x_2, \dots, x_n, then:
\bar{x} = \frac{x_1 + x_2 + \cdots + x_n}{n}
Because the sample changes, \bar{x} changes—so \bar{x} has a sampling distribution.
Center of the sampling distribution of \bar{x}
The sampling distribution is centered at the population mean:
\mu_{\bar{x}} = \mu
So \bar{x} is an **unbiased estimator** of \mu.
Spread: the standard deviation of \bar{x}
If the population standard deviation is \sigma, then:
\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}
This formula is one of the most important ideas in statistics: averaging reduces variability. Intuitively, individual observations bounce around a lot, but the average of many observations tends to be more stable.
You’ll often hear \sigma_{\bar{x}} called the **standard deviation of the sampling distribution** or (in inference contexts) the **standard error** of the mean when \sigma is estimated.
Shape: when is \bar{x} Normal (or approximately Normal)?
The shape of the sampling distribution of \bar{x} depends on the population distribution and the sample size.
There are two main pathways to Normality:
If the population is Normal, then \bar{x} is exactly Normal for any sample size n.
If the population is not Normal, then \bar{x} becomes approximately Normal when n is large enough (this is the Central Limit Theorem, developed more fully in the next major section).
A key misconception is thinking the CLT makes the data Normal. It does not. The raw observations can remain skewed or weird; it’s the distribution of the sample mean that becomes approximately Normal as n grows.
Conditions to use the sampling distribution results for \bar{x}
As with proportions, you need randomness and (approximate) independence.
Random condition: data come from a random sample or randomized experiment.
Independence / 10% condition: when sampling without replacement from a finite population of size N, check:
n \le 0.1N
- Normality condition for using a Normal model for \bar{x}:
- Population is Normal, or
- Sample size is large enough for CLT to apply (often “large” is around 30 in many textbook settings, but on AP you justify based on context, skewness/outliers, and sample size rather than a single magic cutoff).
Worked example: probability involving \bar{x}
Suppose the amount of soda filled into bottles has population mean \mu = 500 mL and population standard deviation \sigma = 4 mL. A random sample of n = 36 bottles is selected. What is the probability the sample mean fill is less than 498.5 mL?
Step 1: Check conditions
- Random: assume a random sample.
- 10%: plausible if the day’s production is far more than 360 bottles.
- Shape: even if the fill amounts are not perfectly Normal, n = 36 is reasonably large; in many realistic manufacturing settings the distribution is roughly symmetric anyway.
Step 2: Compute the mean and standard deviation of \bar{x}
\mu_{\bar{x}} = 500
\sigma_{\bar{x}} = \frac{4}{\sqrt{36}} = \frac{4}{6} = 0.6667
Step 3: Convert to z-score and compute probability
z = \frac{498.5 - 500}{0.6667} = \frac{-1.5}{0.6667} = -2.25
So:
P(\bar{x} < 498.5) \approx P(Z < -2.25)
This is about 0.012.
Interpretation: Even though individual bottles vary with standard deviation 4 mL, the average of 36 bottles varies much less (standard deviation about 0.67 mL). So getting a sample mean as low as 498.5 mL is rare.
Connecting back to inference
When you build a confidence interval for \mu or run a test about \mu, you’re standardizing how far your sample mean is from the hypothesized mean using the typical sampling variability of \bar{x}. This is why the sampling distribution formulas show up everywhere in Unit 6 and beyond.
Exam Focus
- Typical question patterns:
- “Given \mu, \sigma, and n, find P(\bar{x} > \text{value}) or between two values.”
- “Describe the sampling distribution of \bar{x}: shape, center, spread (with conditions).”
- “How does changing n affect the distribution of \bar{x}?”
- Common mistakes:
- Using \sigma_{\bar{x}} = \sigma/n instead of dividing by \sqrt{n}.
- Treating \sigma (population SD) as if it shrinks with larger samples; it’s the SD of \bar{x} that shrinks.
- Ignoring strong skew/outliers with small n and claiming Normality without justification.
Central Limit Theorem
What the CLT says in plain language
The Central Limit Theorem (CLT) is a foundational result that explains why Normal models show up so often.
It says: if you take many random samples of size n from a population with mean \mu and standard deviation \sigma, then as n becomes large, the sampling distribution of the sample mean \bar{x} becomes approximately Normal—no matter what the population distribution looks like (as long as it has a well-defined mean and standard deviation).
The CLT is not magic; it’s a statement about what happens when you average many independent pieces of randomness. Extreme values tend to get “smoothed out” by averaging.
Why the CLT matters
Without the CLT, you would often need to know the exact population distribution to do probability calculations or inference about means. The CLT lets you use Normal-based methods broadly because it provides a justification for why \bar{x} is approximately Normal in many real settings.
That “approximately” is crucial. The approximation can be excellent, decent, or poor depending on the situation.
How the CLT works conceptually (what improves the approximation)
Several factors affect how quickly the sampling distribution becomes close to Normal:
- Sample size n: bigger n generally makes the sampling distribution more Normal.
- Population shape:
- If the population is already close to Normal, even small n works well.
- If the population is strongly skewed or has outliers, you generally need a larger n.
- Independence: the CLT assumes observations behave like independent draws. This is why random sampling and the 10% condition matter.
A helpful analogy: imagine adding up many small random “nudges.” No single nudge determines the final result; the total becomes more predictable in shape.
Formal statement for the mean
For sufficiently large n:
\bar{x} \approx N\left(\mu, \frac{\sigma}{\sqrt{n}}\right)
More explicitly:
\mu_{\bar{x}} = \mu
\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}
and the shape is approximately Normal when CLT conditions are met.
CLT and proportions: how they connect
A sample proportion \hat{p} is the mean of 0-1 indicator variables (1 for success, 0 for failure). Because of that, a CLT-type result applies to proportions too. In AP Statistics, this is operationalized through the Large Counts condition:
- np \ge 10
- n(1-p) \ge 10
When those expected counts are large enough, the sampling distribution of \hat{p} is approximately Normal.
So you can think of the “Normal approximation for \hat{p}” as a special case of the CLT applied to Bernoulli trials.
Worked example: using CLT when the population is skewed
Suppose the waiting time at a busy café has mean \mu = 6.8 minutes and standard deviation \sigma = 4.5 minutes. The distribution of individual waiting times is strongly right-skewed (a few very long waits).
A manager takes a random sample of n = 50 customers and computes the mean waiting time. Approximate the probability that the sample mean exceeds 8 minutes.
Step 1: Justify CLT use
- Random sample: assume yes.
- Independence: assume the sample is less than 10% of all customers in the time period of interest.
- Population is skewed, but n = 50 is fairly large, so CLT gives a reasonable Normal approximation for \bar{x}.
Step 2: Describe sampling distribution of \bar{x}
\mu_{\bar{x}} = 6.8
\sigma_{\bar{x}} = \frac{4.5}{\sqrt{50}}
Compute:
\sigma_{\bar{x}} \approx \frac{4.5}{7.071} \approx 0.636
Step 3: Compute z-score
z = \frac{8 - 6.8}{0.636} \approx 1.89
So:
P(\bar{x} > 8) \approx P(Z > 1.89)
This is about 0.029.
Important interpretation: Individual waits are very skewed, but the average of 50 waits is much less skewed, and a Normal model for \bar{x} is often usable.
What goes wrong: common CLT misunderstandings
“If n is large, the data are Normal.”
Wrong. The CLT is about the distribution of \bar{x} (or sums/averages), not the distribution of individual observations.“n = 30 always guarantees Normal.”
Not guaranteed. It depends on the population shape. For extremely skewed data with outliers, you may need larger n for a good approximation.Forgetting independence
If data are dependent (for example, sampling a large fraction without replacement, or measuring repeated outcomes from the same individual), CLT conclusions can fail or require more advanced methods.
A brief note on sums (same idea, different scale)
Sometimes problems are phrased in terms of the sum of observations rather than the mean. If S = x_1 + x_2 + \cdots + x_n, then (under similar conditions):
\mu_S = n\mu
\sigma_S = \sigma\sqrt{n}
The sum and the mean contain the same information (since \bar{x} = S/n), but they have different units and different spreads.
Exam Focus
- Typical question patterns:
- “Population is skewed with given \mu and \sigma. For sample size n, approximate a probability about \bar{x} using the CLT.”
- “Explain why the sampling distribution of \bar{x} is approximately Normal even though the population is not.”
- “Decide whether a Normal approximation is reasonable and justify using conditions (random, 10%, and sample size vs skew/outliers).”
- Common mistakes:
- Claiming CLT applies without mentioning randomness/independence.
- Using CLT language for proportions but forgetting the Large Counts check.
- Mixing up the standard deviation of individuals \sigma with the standard deviation of the sample mean \sigma_{\bar{x}}.