AP Stats Unit 4: Collecting Data - Terms & Methods

5.0(1)

Studied by 17 people

0%Unit 3 Mastery

0%Exam Mastery

Build your Mastery score

AP Practice

Supplemental Materials

Call Kai

Card Sorting

1/61

Earn XP

Description and Tags

Corresponds to AP Stats Unit 3

Statistics

Collecting Data

AP Statistics

Unit 3: Collecting Data

Last updated 5:53 PM on 10/5/25

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

62 Terms

New cards

bias

when the calculated value from the sample is CONSISTENTLY an over- or underestimate of the true value of the population

low bias = high accuracy
high bias = low accuracy

New cards

accuracy

when the calculated value from the sample (the mean, etc) is close to the true value of the population

high accuracy = low bias
low accuracy = high bias

New cards

variability

when the calculated values from the sample are CONSISTENTLY scattered/can be at extremes and far away from each other

or, low variability = when calculated values are CONSISTENTLY CLOSE to each other

*does not imply that they have to be close or far from the true value of the population; that’s bias

low variability = high precision
high variability = low precision

New cards

precision

when the calculated values from the sample are CONSISTENTLY close to each other

high precision = low variability
low precision = high variability

New cards

census

a survey of the ENTIRE population; usually not feasible, and samples are preferred

New cards

simple random sample (SRS)

sampling method where choosing every individual AND group of individuals is equally likely to occur

LOW BIAS
can lead to mid-high VARIABILITY

New cards

METHOD: conducting a simple random sample (SRS)

label the individuals by assigning them numbers from 1-X
1. or, write names on slips of paper
using an RNG, randomize the labels by choosing [sample size] UNIQUE numbers
1. or, shuffle slips in a hat and select [sample size] - without repeating any names
select the individuals (and conduct the study/administer a survey/etc) that correspond to those numbers
1. or, select the individuals whose names were chosen

New cards

population

entire group/pool of potential people who could be selected for a study

CAN generalize to the population when: random sample FROM that exact population is chosen

New cards

convenience sample

sampling method where the experimenter chooses the experimental units that are closest by, the first ones, easiest to reach, etc.

HIGH BIAS (first ones often have a trait that the rest of the population might not)

New cards

voluntary response sample/bias

sampling method where the experimental units/subjects choose to be a part of the sample

HIGH BIAS: people who choose often have polarized opinions

New cards

μ (population mean)

population mean; true value that the sample mean (σ) should be close to

New cards

stratified random sample

sampling method where the population is split into homogeneous groups (strata), and SRS’s of the appropriate/representative number are taken from each stratum to make up the whole sample

LOW BIAS
LOW VARIABILITY: due to homogeneous groups, groups that tend to be similar in the measured value aren’t going to be disproportionately overrepresented, leading to a lower variability in the calculated sample value

New cards

METHOD: conducting a stratified random sample

split the population into [#] strata according to [trait]
starting with the [X] stratum, assign all [# participants] numbers from 1-X
use an RNG to randomly select [stratum sample size] unique numbers between [#-# in the stratum]
1. with [stratum sample size] being the proper number of experimental units from that stratum, such that the stratum’s representation in the sample is weighted as desired
select (and conduct the study upon) the individuals whose labels correspond to the selected numbers
repeat this process for the other strata

added from SRS: splitting population & starting from x stratum; repeating process for other strata

New cards

REASONING: conducting a stratified random sample

to reduce variability from an SRS; the strata should be homogeneous in terms of their values that the study is measuring, so limiting how many experimental units come from that stratum will make it imposible to overrepresent values like theirs → less extremity in both directions

New cards

cluster sample

sampling method where the population is split into heterogeneous groups (clusters), and then the proper number of those clusters is chosen to create the whole sample

LOW BIAS
same VARIABILITY as SRS
EASIER/LESS COSTLY TO CARRY OUT

New cards

METHOD: conducting a cluster sample

split the population of [# experimental units] into [#] clusters according to the [cluster reason/group]
label each cluster with a number from 1-[# of clusters]
using an RNG, choose [# of clusters to select according to sample size; pick ENOUGH clusters to SATISFY sample size] unique numbers from 1-[# of clusters]
select the clusters that correspond to the chosen numbers, then select all of the [experimental units] within those chosen clusters
1. i.e. conduct a census of the cluster

added from SRS: split population into clusters, label CLUSTERS and CHOOSE # of clusters, then select all individuals within those clusters

New cards

systematic random sample

sampling method where a starting individual is randomly identified in the population, and then the individual x individuals away from that individual (x being the interval size) is selected; repeats until the desired sample size is achieved (and should go through the whole population at least once)

New cards

METHOD: conducting a systematic random sample

label all [experimental units] with a number between 1-X
using an RNG, generate a number between 1-[# of the interval’. the [individual] corresponding to this number represents the starting [individual].
select the starting [individual]. then, select every [#interval]th player that is past this player on the list, UNTIL [#sample size] players are selected.

interval selection: if you need sample size of n and population is N, interval should be N/n ish so you go through about the whole population

New cards

REASONING: conducting a cluster sample

easy, fast, and cost-effective to collect data from a cluster, because they’re all centralized in the same location(s)

New cards

REASONING: conducting a systematic random sample

easy and simpler to carry out and conduct (in some ways)

if the list is in order of something, it has an effect that reduces variability (assuming the order is homogeneous somehow)
otherwise, it is just another method of getting a sample

New cards

“pathway” to a study

population
sample frame
1. undercoverage bias starts here
target sample
1. nonresponse bias starts here (between this & respondents)
respondents
1. response bias starts here
2. question wording bias starts here

New cards

sample frame

the list (in reality) of experimental units available from which a sample is chosen; aimed to be as equal to the population as possible

New cards

target sample

selecting a sample from the sample frame using an unbiased method, potentially reducing variability by choosing stratification, etc.

New cards

undercoverage bias

when the sample frame is NOT representative of the population; usually, it excludes or favors certain groups in the population

New cards

nonresponse bias

when a proportion of the target sample does NOT respond and does NOT become part of respondents

cause: people can’t be reached or refuse to answer
effect: less responses, may exclude a certain group (that’s more likely to not answer) from the final respondents group

New cards

response bias

when the data gathered from respondents (whether the data itself, the process of getting the data, etc) has bias or issues of some kind and affects the accuracy of the responses

examples: experimenter’s attire when collecting the data pressures people to give a certain answer so they don’t disappoint the experimenter; people lie to represent themselves in a better light to the experimenter

New cards

question wording bias (response bias)

specific type of response bias, where the respondents give inaccurate responses because the question is worded in a manner that pressures them to give a certain response

New cards

observational study

study where NO treatments are imposed on experimental units; the subjects are NOT made to do anything

New cards

prospective (observational) study

“looking forward” — a type of observational study, where experimental units are identified in advance (potentially collecting data at the beginning), and then later experimenters follow up with the experimental units (no imposition of treatments)

New cards

retrospective (observational) study

“looking backward” — a type of observational study, where experimental units are identified and their pasts are inquired about (no imposition of treatments)

New cards

experiment

a type of study where treatments are imposed on experimental units

New cards

experimental units

participants in the study (can be objects, animals, people, etc); what or who treatments are imposed on

subjects if the experimental units are human beings

New cards

treatments

what is done/not done to experiment units/what is imposed or not imposed; usually the explanatory variable’s options; can include levels OR combinations of the explanatory variables

New cards

explanatory variable (factor)

impacts the response variable; the options/treatments possible — must specify:

“whether or not” if it’s a YES/NO
the “level” of the treatment if it’s a varying intensity

New cards

response variable

the measured variable that is analyzed after the explanatory variable is manipulated

New cards

confounding variables

other potential factors that affect or create the results that we observe (especially common in observational studies); related to/influences the explanatory variable (one goes up, other goes down; both go up; both go down), and ALSO influences the response variable (one goes up, other goes down; both go up; both go down)

MUST SPECIFY relationship/direction between:
- confounding variable & explanatory variable
- confounding variable & response variable
creates illusion that explanatory variable affects the response variable in the way that the confounding variable does
correlation, but NOT causation

New cards

well-designed experiments include…

replication
random assignment
comparison
control

New cards

replication

repeating the study multiple times, either through multiple trials on the same people OR multiple trials with different groups of people

increases validity

New cards

random assignment

when experimental units are randomly assigned their treatments; reduces bias and potential confounding variables that would result from subjects choosing their own treatments

benefit: allows us to claim/prove causation between the treatments/explanatory variables & the response variables
allows GROUPS to be ROUGHLY EQUIVALENT

New cards

comparison

having another group, whether a control or an experimental group, to compare results to

control group: group without treatment that acts as comparison
experimental group: group receiving a different treatment that acts as comparison
- required to have an experimental group with another treatment draw conclusions between these 2 different treatments

New cards

control

keeping other variables (that may affect results) constant, to reduce potential confounding variables

New cards

CONDUCTING: random assignment

label all individuals within the sample 1-X
using an RNG, select [amount for one treatment] unique #s between 1-X
the individuals who were selected receive the [X] treatment
if more than 2 treatments: remove the numbers selected and then reselect [amount for one treatment] more unique #s between 1-X that weren’t chosen before; the individuals corresponding to these numbers will receive the [X] treatment…
1. repeat for all treatments (except the last one:)
last treatment: the remaining individuals will receive the [X, last] treatment

New cards

placebo effect

when people receive a treatment without an active ingredient/any true effect, they still show “improvement”/get better due to a psychological bias and idea that they should feel better from the treatment

New cards

blinding

when the experimenters or the subjects (or both; see double-blinding) are not aware of whether subjects received a real treatment or a placebo

reduces bias:

for subjects: will not have biased results depending on if they know they got a real treatment or not; can determine if there is the placebo effect
for experimenters: will not act biased when interacting with subjects depending on their treatment or placebo assignment (e.g. won’t tailor themselves in certain ways, won’t subconsciously hint at/reveal the truth of the treatment)

New cards

double-blinding

when NEITHER the experimenters NOR the subjects are aware of whether subjects were given a real treatment or a placebo; reduces bias on both ends (see blinding)

New cards

completely randomized design

an experimental design that consists of a single experiment, with experimental subjects in a randomized and heterogeneous sample

New cards

block

group of experimental units that are similar with respect to their response (to the measured, response variable)

New cards

randomized block design

an experimental design that separates the sample into homogeneous (with respect to their response to the response variable) blocks, then randomly assigns all treatments within each block (as if conducting microcosm-experiments in each block)

reduces variability** IF blocks are correctly split into homogeneous groups, since their responses are similar to each other
controls for BLOCKS as confounding variables

New cards

METHOD: conducting a randomized block design (writing)

separate the [experimental units] into blocks of [X] (specify all blocks). within each block, number the [experimental units] from 1-X
for the first block, use an RNG to select [X] unique numbers, and give the individuals who correspond to those numbers the [X] treatment. (specify the other treatments…) repeat this step for every block
compare their [response variable/effect] in each of the blocks
finally, bring all the blocks together to combine and compare overall

New cards

METHOD: randomized block design with ONE factor (diagram)

New cards

METHOD: randomized block design with MULTIPLE factors (diagram)

New cards

REASONING: randomized block design

Randomized block design helps to:

control for ____ (what the blocks are sorted for) as a confounding factor.

Within each block, there will be:

lower variability of results

making it easier to determine ____ (what the experiment is trying to determine)

New cards

REASONING: confounding variables are problems

If [an effect is observed], we wouldn’t know if it was because of [explanatory variable] or because of [confounding variable], as [confounding variable] may be a confounding variable and we would not correctly determine causality.

New cards

matched pairs design

a type of randomized block design where the “blocks” are pairs, of either 2 very similar subjects OR 1 person but 2 trials/2 parts of the subject

2 subjects: the two subjects are randomly assigned to 1 of 2 treatments, and results are compared
1 subject: each subject receives 2 treatments, still randomized: e.g. the order in which they receive them is randomized; where they receive them (e.g. on two comparable parts of their body, like right/left leg) is randomized; results are compared

New cards

METHOD: conducting a matched pairs design

if 2 subjects: pair 2 very similar/comparable subjects, then the next two, and so on, forming [#] pairs. label the first person in the pair “1,” and the second “2”
1. randomize: using an RNG, select 1 or 2. for the first pair, the subject corresponding to the # selected will receive [X] treatment; the other subject will receive [Y] treatment
if one subject: for each subject, run the RNG for 1 or 2, and then assign [X] treatment first/on the right/etc if 1, second/on the left if 2 (or something similar)
repeat the above for all pairs/subjects
compare the difference between the two treatments for every pair of subjects/subject

New cards

simulation

used to determine the chance of an event, statistic value/threshold, etc. of occurring by chance alone; repeatedly randomly assigns the values to random labels and plots results on a scatterplot

New cards

confidence interval %

the chance that, if explanatory does NOT affect response variable, you obtain the result that you got by chance alone

found by repeatedly simulating the data points by randomly assigning them to the results & creating a scatterplot of how many got a certain % of results

New cards

CALCULATION: confidence interval %

confidence interval % for your result = [# of points ≥ yours]/[total # points] * 100%

New cards

statistically significant (CI 5%)

when the results of an experiment are unlikely (less than 5%, >=5%) to happen by chance alone

if it IS statistically significant, we have convincing evidence that the treatment caused a difference

New cards

scope of inference

whether we can conclude causality, and to what extent we can infer the results and generalize to what population

New cards

when can we conclude causation?

when random assignment of treatments is present

New cards

when can we conclude generalization to the population?

when a random sample from THAT POPULATION is taken

can be inferred up to the population that the sample was taken from (i.e. up until the point that all those that could have been chosen)