AP Statistics Unit 3 Notes: How to Collect Data with Good Samples

Planning a Study: Populations, Samples, and Generalizability

When you collect data, you’re usually trying to learn something about a big group without measuring every single member of that group. The entire purpose of sampling is efficiency—but efficiency is only valuable if your conclusions are trustworthy. This section is about learning how to plan a study so that the data you collect can legitimately answer your question.

Populations, samples, and the parameter–statistic idea

A population is the entire group you want to learn about. A sample is the subset of that population you actually collect data from. In AP Statistics, the goal of sampling is typically to use a sample to estimate something about the population.

A helpful way to keep your thinking organized is:

A parameter is a numerical summary of a population (often unknown).
A statistic is a numerical summary computed from a sample (known once you collect data).

For example, if you want to know the proportion of all students at a school who get at least 8 hours of sleep, the population parameter is the true proportion for all students (unknown), and your sample statistic is the proportion in the students you surveyed.

Why this matters: most mistakes in sampling come from mixing up “who you want to know about” (the population) and “who you actually got data from” (the sample). If those two don’t align, your conclusions can fail even if you compute statistics perfectly.

Defining the population precisely (the question drives the population)

A common AP Statistics skill is translating a real-world question into a clear population definition. You should specify:

Who: the units (people, households, widgets, etc.)
Where: location if relevant
When: time frame if relevant

Example: “Do teens prefer streaming to cable?” is too vague. A better target might be: “All students enrolled in grades 9–12 in public high schools in California in 2026.”

This precision matters because “population” is not automatically “everyone.” It’s everyone relevant to your question.

Census vs. sample (and why you still often sample)

A census measures every member of the population. That sounds ideal, but it’s often impossible or impractical due to:

Cost and time
Inaccessibility (you can’t find everyone)
Measurement burden (people won’t respond carefully)

Also, a census can still be wrong if measurement is poor (bad wording, inaccurate responses). So “census” does not automatically mean “bias-free.”

Sampling frame: the “list you can actually reach”

A sampling frame is the list or mechanism that identifies all the units you can realistically sample from (for example, a roster of students, a voter registration list, a customer database).

The sampling frame matters because most real bias problems start here: if the sampling frame doesn’t match the population, some groups are systematically left out.

Generalizability: when can you extend results to the population?

Generalizability means using results from a sample to make conclusions about the population.

In AP Statistics, the key idea is:

If your sample is selected using a random sampling method from the population (or from a sampling frame that closely matches the population), then it is reasonable to generalize from the sample to the population.

Generalizability is about who you can talk about. It is different from causation, which is about whether a treatment caused a change (that’s tied to random assignment in experiments, not random sampling).

A good mental model:

Random sampling supports generalizing to a population.
Random assignment supports cause-and-effect conclusions.

Worked example: identifying population, sample, and whether you can generalize

Scenario: A school counselor emails a stress survey to all 1800 students. Only 240 students respond.

Population of interest: all 1800 students (assuming the question is about the whole school).
Intended sampling method: this was closer to a census attempt (contact everyone), but the data come only from respondents.
Actual sample: the 240 students who responded.
Can you generalize to all students? Caution: the respondents are likely different from non-respondents (nonresponse bias risk). Even though the counselor tried to contact everyone, the final sample is not a random sample of all 1800.

What goes wrong (common planning pitfalls)

One of the most common misconceptions is: “If the sample is big, it must be good.” Large samples reduce random variability, but they do not automatically fix bias. A huge convenience sample can still be systematically unrepresentative.

Another common issue is failing to define the population you want to generalize to. If the population is “all U.S. adults,” sampling only from one city (even randomly within that city) does not justify nationwide conclusions.

Exam Focus

Typical question patterns:
- Define the population and describe the sample based on a scenario.
- Decide whether results can be generalized to a stated population and justify using randomness (or explain why you can’t).
- Identify the sampling frame and discuss how mismatch could create bias.
Common mistakes:
- Confusing population with sample (or describing the sampling frame as the population).
- Claiming “random assignment” allows generalization (it doesn’t; it supports causation).
- Saying “large sample” guarantees validity without addressing bias.

Sampling Methods (SRS, Stratified, Cluster, Systematic)

A sampling method is the rule you use to choose which units end up in your sample. In AP Statistics, a “good” sampling method is typically one that uses randomness in a controlled way so that the sample tends to be representative and avoids systematic bias.

The big idea of random sampling

A random sample is selected by a chance process that gives units in the population a known (and typically equal) chance to be chosen. Randomness matters because it helps prevent your personal choices (or the ease of access) from shaping the sample.

You should separate two ideas:

Bias is a systematic tendency to overrepresent or underrepresent some outcomes.
Variability is natural random fluctuation from sample to sample.

Random sampling doesn’t guarantee a perfect sample every time, but it does make bias less likely and allows statistical inference methods later in the course to work as intended.

Simple Random Sample (SRS)

An SRS (Simple Random Sample) of size $n$ from a population of size $N$ is a sample selected so that every possible group of $n$ individuals has an equal chance of being selected.

Why this matters: an SRS is the “gold standard” baseline. Many other methods are justified by comparing them to what an SRS would do.

How it works (typical AP procedure):

Label each member of the population with a unique number from 1 to $N$ .
Use a random number generator (random digits table, calculator, or software) to select $n$ distinct labels.
The units with those labels form your sample.

Example (SRS):
A teacher wants an SRS of 5 students from a class of 28 to ask about homework time.

Step 1: Label students 01 to 28.
Step 2: Generate random two-digit numbers.
Step 3: Keep numbers 01–28, skip 00 and 29–99, and skip repeats.
Step 4: Stop when you have 5 valid, unique labels.

What can go wrong:

Forgetting to handle repeats (in an SRS without replacement, repeated random numbers should be ignored).
Using “random” methods that aren’t actually random (like “pick students who look representative”).

Stratified Random Sample

A stratified random sample starts by dividing the population into strata—groups that are similar within themselves on some important variable (like grade level, gender, or region). Then you take a random sample from each stratum (often an SRS within each stratum) and combine them.

Why this matters: stratifying can produce a more precise estimate than an SRS of the same size when the strata differ from each other in ways related to the variable you care about. It also ensures each key subgroup is represented.

How it works:

Choose a stratification variable (one you believe is related to the response).
Split the population into non-overlapping strata that cover the whole population.
Randomly sample within each stratum.
Combine all selected units.

A common approach is proportional stratification, where you sample each stratum in proportion to its size in the population. (AP questions often give numbers that make proportional allocation natural.)

Example (stratified):
A principal wants to estimate the proportion of students who feel safe at school. The school has 900 students: 300 freshmen, 250 sophomores, 200 juniors, 150 seniors. They want a sample of 90.

Proportional plan: sample 30 freshmen, 25 sophomores, 20 juniors, 15 seniors.
Within each grade, take an SRS of the needed number.

What can go wrong:

Confusing stratified sampling with cluster sampling (strata are internally similar; clusters are mini-populations that are internally mixed).
Creating strata after sampling (the grouping is part of the design and should happen before selection).

Cluster Sample

A cluster sample divides the population into clusters that ideally look like mini-versions of the whole population—each cluster contains a mix of different types of individuals. Then you randomly select some clusters and include all individuals in the chosen clusters (or sometimes sample within chosen clusters in multistage designs, but the classic AP definition is “take all within selected clusters”).

Why this matters: cluster sampling is often used when a population is naturally grouped and it’s expensive or impractical to sample individuals spread out across the entire population. It saves time and travel.

How it works:

Partition the population into clusters (e.g., classrooms, city blocks).
Randomly select clusters.
Survey every individual in the selected clusters.

Example (cluster):
A city wants to estimate average household water usage. They divide the city map into 200 blocks (clusters). They randomly select 10 blocks and measure water usage for all households on those blocks.

What can go wrong:

Using clusters that are not representative mini-populations. If each block is mostly one type of household (e.g., only apartments downtown), then choosing a few blocks could bias results.
Claiming it’s “random” because clusters were picked randomly, while ignoring that the way clusters were formed matters.

Systematic Sample

A systematic sample selects units by choosing a random starting point and then taking every $k$ th unit from an ordered list.

The step size is often:
$k = N/n$
where $N$ is the population size (or list length) and $n$ is the desired sample size.

Why this matters: systematic sampling is simple to implement (especially on production lines, queues, or alphabetized lists) and can spread the sample evenly across the frame.

How it works:

Put the population in an ordered list (the “frame”).
Compute $k$ (often rounding to a convenient integer if needed, depending on context).
Randomly select a start between 1 and $k$ .
Take that unit and then every $k$ th after it.

Example (systematic):
A library has 1200 books checked out this month and wants a sample of 60 receipts to audit.

Compute $k = 1200/60 = 20$ .
Randomly pick a start from 1 to 20, say 13.
Sample receipts numbered 13, 33, 53, 73, … until 60 receipts are chosen.

What can go wrong:

Periodicity: if the list has a repeating pattern that lines up with $k$ , you can accidentally select a biased set. For instance, if every 20th receipt comes from a particular branch due to how data are batched, the systematic sample could overrepresent that branch.
Forgetting the random start (starting at 1 is not random and can create bias if the order is meaningful).

Comparing the methods (how to choose)

Choosing a method is about balancing practicality with representativeness.

Method	Core idea	When it’s useful	Main risk if done poorly
SRS	Every group of size $n$ equally likely	Simple, general purpose	Hard to implement without a complete list; still can have nonresponse
Stratified	Random within important subgroups	Ensures subgroup representation; can improve precision	Wrong strata choice or confusing with clusters
Cluster	Randomly choose groups, sample everyone in them	Cost/time savings when groups are natural	Clusters not representative mini-populations
Systematic	Random start + every $k$ th	Easy to carry out; spreads sample across list	Periodicity; non-random start

Worked classification practice (what method is it?)

Scenario A: A company wants feedback from 200 employees. They separate employees by department (sales, engineering, HR) and randomly sample 10% from each department.

This is stratified (departments are strata; sample within each).

Scenario B: A university randomly selects 8 dorm floors and surveys everyone living on those floors.

This is cluster (floors are clusters; include all within selected clusters).

Scenario C: A teacher uses a random number generator to pick 12 students out of 32.

This is an SRS if every set of 12 is equally likely.

Scenario D: A factory inspects every 50th item after randomly selecting the first item from 1 to 50.

This is systematic.

Exam Focus

Typical question patterns:
- Identify the sampling method used in a described procedure (SRS vs stratified vs cluster vs systematic).
- Describe how to take a specific kind of sample (often “Describe an SRS of size $n$ ” with labeling and random selection details).
- Compare methods: explain why stratifying by a given variable could improve accuracy, or why cluster sampling is cheaper.
Common mistakes:
- Mixing up stratified and cluster (remember: strata = similar within; clusters = mixed within).
- Calling a convenience method “random” without an actual chance process.
- For systematic sampling, forgetting the random start or ignoring possible periodic patterns in the list.

Sources of Bias in Sampling

Even if you understand sampling methods, your data can still be unreliable if the method systematically favors certain outcomes. Bias in sampling means your sampling process tends to produce estimates that are consistently too high or too low (or consistently misrepresentative) for the population.

A key mindset shift: bias is about how the data were produced, not about whether you like the results.

Undercoverage (missing parts of the population)

Undercoverage occurs when some members of the population are left out of the sampling frame or are very unlikely to be selected.

Why it matters: if the missing group differs in the response variable, your sample statistic will be systematically off.

How it happens:

Using an incomplete list (sampling frame) that omits a segment (e.g., surveying “all residents” using a list of landline phone numbers).
Sampling at locations/times that exclude certain people (e.g., surveying commuters at 10 a.m.).

Example (undercoverage):
A poll about cafeteria satisfaction is done by sampling students present in the cafeteria during lunch. Students who bring lunch or skip lunch are less likely to be included and might have systematically different opinions about the cafeteria.

Common misconception: “But we sampled randomly from the people who were there.” Randomness within the wrong frame does not fix undercoverage.

Nonresponse bias (selected but don’t participate)

Nonresponse bias happens when individuals chosen for the sample can’t be contacted or refuse to participate, and those non-responders differ from responders.

Why it matters: you can start with a beautifully designed random sample and still end up biased if response rates are low and nonresponders are systematically different.

Example (nonresponse):
A random sample of households is mailed a long survey about finances. People with very high or very low incomes might be less likely to respond, skewing estimates of average income.

How to reduce it (conceptually):

Follow-up contacts, reminders
Incentives
Shorter, clearer surveys

(You don’t need to memorize a specific “best” fix for AP, but you should be able to propose reasonable ways to improve response.)

Response bias (inaccurate answers)

Response bias occurs when the responses people give are systematically inaccurate. This is different from nonresponse: people do respond, but their answers are biased.

Why it matters: response bias can distort conclusions even if the sample is random and everyone responds.

Common causes:

Social desirability: people misreport to look good (e.g., underreporting cheating).
Sensitive questions: respondents may lie or skip details.
Interviewer effects: the presence or behavior of an interviewer influences responses.
Poorly worded questions (closely related, discussed next).

Example (response bias):
A survey asks students, “How many hours did you study last night?” Some students inflate the number to appear responsible.

Wording bias and leading questions

A leading question encourages a particular response through its wording.

Why it matters: it can push people toward an answer regardless of their true opinion, creating systematic error.

Example (leading):
“Do you agree that the school should improve its outdated and unsafe gym equipment?”

This question contains loaded language (“outdated and unsafe”) that nudges respondents toward agreement.

Better: “Do you think the school should replace gym equipment this year?” (And often you’d include balanced response options.)

AP-style tip: when asked to critique a survey, look for emotionally charged words, double-barreled questions (“and”), and answer choices that don’t cover all reasonable responses.

Voluntary response samples (a particularly biased design)

A voluntary response sample consists of people who choose themselves to participate, often because they have strong opinions.

Why it matters: people with extreme experiences are more likely to respond, so results are often biased toward strong feelings.

Example (voluntary response):
A news website posts a poll: “Was the referee unfair?” Visitors choose whether to click yes/no. People who are angry are more likely to respond.

Voluntary response is not fixed by large sample size—getting 50,000 self-selected responses can still be badly biased.

Convenience samples (easy to collect, hard to trust)

A convenience sample is chosen because it is easy to access (friends, people in your class, shoppers at one store).

Why it matters: convenience sampling often produces undercoverage (you’re missing types of people who aren’t convenient to reach). Sometimes it’s okay for quick, informal information, but it typically does not support valid generalization.

Example (convenience):
A student surveys only students in their own AP classes to estimate the average time students spend on homework at the school.

Bias vs. variability: what students often mix up

It’s easy to treat “wrong result” as “bias,” but AP Statistics wants you to identify the mechanism.

Random variability: different random samples give different results; this is expected.
Bias: the method tends to miss in the same direction over and over.

A sample can be unbiased but still not match the population perfectly—especially if $n$ is small.

Worked example: diagnosing bias in a sampling plan

Scenario: A town wants to estimate support for a new property tax. They call residents using a list of landline phone numbers and ask, “Do you support this important tax that will improve our schools?” Many numbers don’t answer; they don’t call back.

Diagnose issues:

Undercoverage: residents without landlines (often younger people) are excluded.
Wording/leading: “important” and “improve our schools” pushes toward “support.”
Nonresponse: people who don’t answer might systematically differ (and no call-backs makes it worse).

Even if they “randomly select” numbers from the landline list, the design still has multiple bias sources.

Exam Focus

Typical question patterns:
- Identify a likely source of bias (undercoverage, nonresponse, response bias, voluntary response, convenience, wording) from a scenario.
- Explain the direction/impact in context (who is overrepresented or underrepresented, and how that could affect the estimate).
- Propose a specific improvement (use a better sampling frame, follow-ups, neutral wording, random selection).
Common mistakes:
- Saying “it’s biased because it’s not random” without naming the specific mechanism (undercoverage vs voluntary response vs convenience).
- Confusing nonresponse (no data from selected people) with response bias (people respond inaccurately).
- Assuming bias disappears with a large sample size; large biased samples are still biased.