AP Statistics Exam Review Flashcards

0.0(0)

Studied by 0 people

0%Exam Mastery

Build your Mastery score

AP Practice

Supplemental Materials

Call Kai

Card Sorting

1/82

Earn XP

Description and Tags

Statistics

AP Statistics

Last updated 7:34 AM on 5/14/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai	Chat

No analytics yet

Send a link to your students to track their progress

83 Terms

New cards

Additional Rule

General Rule (Non-Mutually Exclusive): Used when events can happen simultaneously. You subtract the probability of both events occurring to avoid double-counting.

Example: Drawing a Heart (13/52) or a Queen (4/52)

Mutually Exclusive Events: Used when events cannot happen simultaneously

Example: Rolling a 3 or a 5 on a single die

New cards

Bias

A systematic, non-random error that causes a misalignment between the true population parameters and the sample statistics, resulting in distorted results that consistently overestimate or underestimate a value

New cards

Binomial Setting

A probability model for a scenario involving a fixed number of independent trials (n), where each trial has only two possible outcomes—labeled "success" (p) or "failure" (1 - p)—and the probability of success remains constant across all trials

Binary: Outcomes are strictly "success" or "failure".
Independent: The outcome of one trial does not affect another.
Number: There is a fixed number of trials (n) determined beforehand.
Success: The probability of success (p) is the same for each trial

New cards

Binomial Distribution

A graph that models the number of successes (x) in a fixed number (n) of independent trials, each with a constant probability of success (p).

New cards

Binomial Random Variable

a discrete random variable that counts the number of successes in a fixed number of independent trials, where each trial has only two possible outcomes ("success" or "failure") and the probability of success (p) is constant.

New cards

Blinding

An experimental design technique used to prevent bias by concealing which treatment subjects receive.

Prevent bias by concealing which treatment subjects receive. It ensures that participants and researchers do not know group assignments, reducing the impact of expectations or subjective interpretation on results

New cards

Block Design

An experimental technique used to reduce variability by grouping similar experimental units into "blocks" based on a shared characteristic before randomly assigning treatments within each block

New cards

Central Limit Theorem

states that for a sufficiently large sample size (n>30)

It allows for normal-based statistical inference (confidence intervals/hypothesis tests) even with skewed data.

New cards

Chi-Squared Distribution

The chi-square distribution is a fundamental probability distribution in inferential statistics used primarily for hypothesis testing, goodness-of-fit tests, and confidence intervals

t describes the sum of squared independent standard normal random variables, with its shape determined solely by degrees of freedom

New cards

Chi-Squared Statistic

A numerical measure used in statistics to determine the difference between observed and expected frequencies in categorical data.

New cards

Chi-Squared GOF Test

a statistical hypothesis test that determines if an observed frequency distribution of categorical data matches an expected, theoretical distribution

New cards

Chi-Squared Test for Homogeneity

Determines if different populations (or groups) share the same distribution of a single categorical variable.

Checks if differences in observed frequencies are due to sampling error or actual population differences, using a null hypothesis that all populations have the same distribution

New cards

Chi-Squared Test for Association/Independence

a statistical method used to determine whether two categorical variables are related in a population

It evaluates if the observed frequencies in a contingency table significantly differ from the frequencies we would expect if the variables were completely independent

New cards

Cluster Sample

a probability sampling method where the population is divided into heterogeneous groups(often based on location), and a simple random sample (SRS) of these clusters is chosen

New cards

Coefficient of Determination

measures the proportion of variability in the response variable (y) that is explained by the linear regression model using the explanatory variable (x).

New cards

Complement

represents all outcomes in the sample space that are not included in a sample

New cards

Conditional Probability

measures the likelihood of an event (A) occurring, given that another event (B) has already happened, denoted as P(A|B)

New cards

Conditions for Inference About a Mean

Random: The data must come from a random sample or a randomized experiment to prevent bias and ensure generalization

Normal / Large Sample: states the population is normally distributed, the sample size is large (n>30) (CLT),a graph of the sample data shows no strong skewness or extreme outliers.

Independence (10% Rule): When sampling without replacement, the sample size (n) must be no more than 10% of the total population size (N), written as n >0.10N

New cards

Conditions for Inference About a Proportion

Random: The data must come from a randomized experiment or a random sample to avoid bias.

10% Condition: The sample size must be <10% of the overall population to ensure independence of individual observations (when sampling without replacement).

Large Counts / Normal: The sampling distribution must be approximately Normal. Both the number of successes(np) and failures n(1-p) in your sample must be at least 10

New cards

Conditional for Regression Inference

L - Linear: The relationship between the explanatory variable ($x$) and response variable ($y$) must be genuinely linear.
I - Independent: Individual observations must be independent of each other.
N - Normal: For any fixed value of $x$, the response variable ($y$) should vary normally.
E - Equal Variance (Homoscedasticity): The standard deviation of the residuals must be consistent across all values of $x$.
R - Random: The data must be produced by a well-designed random sample or randomized experiment

New cards

Confidence Interval for a Population Mean

A confidence interval estimates the true population mean using sample data, providing a range of plausible values and a margin of error.

New cards

Confidence Interval for a Population Proportion

used to estimate an unknown population proportion ($p$) based on sample data.

New cards

Confidence Intervals and Two-Sided Tests

Confidence Intervals and Two-Sided Significance Tests are mathematically dual; a two-sided test at significance level (a) and a (1-a) confidence interval will always reach the same conclusion. If the null value falls inside the interval, you fail to reject Ho; if it falls outside, you reject Ho

New cards

Confidence Intervals for Comparing Two Proportions

A two-proportion $z$-interval estimates the difference between two population proportions, p1 - p2, using sample data.

$<p>A two-proportion $z$-interval estimates the difference between two population proportions, p1 - p2, using sample data.</p>$

New cards

Comparing Two Means

Involves analyzing the difference between two independent population means ) using a two-sample t-test

New cards

Confidence Level

estimate unknown population parameters mean or proportion using a range of plausible values derived from sample data.

New cards

Confounding Variable

an outside factor related to both the explanatory (independent) and response (dependent) variables, making it impossible to tell which variable caused the observed outcome

New cards

Continuous Random Variable

takes on any numerical value within an interval (e.g., time, weight, height) and is modeled by a Probability Density Function (PDF) curve

New cards

Control Group

a baseline group in an experiment that does not receive the new treatment or receives a placebo/standard treatment, allowing comparison to the experimental group

New cards

Convenience Sampling

a non-probability sampling method that selects individuals who are easiest to reach, often leading to biased results because it fails to represent the entire population

New cards

Correlation

measures the strength and direction of a linear relationship between two quantitative variables, ranging from -1 to +1.

New cards

Critical Value

determines the boundaries of the rejection region in hypothesis testing or defines the margin of error in confidence intervals

New cards

Discrete Random Variable

numerical outcomes of chance processes with a countable, finite set of distinct values and associated probabilities.

New cards

Empirical Rule

he Empirical Rule (68-95-99.7 rule) in AP Statistics states that for normal distributions, approximately 68% of data falls within 1 standard deviation of the mean, 95% within 2 standard deviation, and 99.7% within 3 standard deviations

New cards

Experiment

a study that deliberately imposes a treatment on individuals to measure their responses, aimed at establishing causation

New cards

Explanatory Variable

the variable used to explain, predict, or cause changes in another variable—the response variable—in a statistical study

New cards

Five-Number Summary

provides a concise snapshot of a dataset's distribution, consisting of the Minimum, First Quartile (Q1), Median (Q2), Third Quartile (Q3), and Maximum.

New cards

Geometric Setting

occurs when independent trials are repeated until the first success, characterized by the BINS acronym: Binary (success/failure), Independent trials, same Number (constant probability (p), and Success

New cards

Geometric Distribution

models the number of independent Bernoulli trials needed to achieve the first success.

New cards

Geometric Random Variable

models the number of independent trials needed to achieve the first success in a Bernoulli process, where each trial has a constant probability of success p

New cards

Inference about a Population Proportion

involves estimating or testing claims about a population parameter using a sample proportion

New cards

Inference about a population mean

involves using sample data to estimate or test claims about a true population mean using t-procedures, as the population standard deviation is rarely known

New cards

Inference for Comparing Two Proportions

involves using a two-sample $z$-test or a two-sample $z$-interval to compare independent population proportions (p1-p2) based on sample data

New cards

Inference for Comparing Two Means

involves using a two-sample t-procedure to test if the difference between two independent group means ($\mu_1 - \mu_2$) is statistically significant.

New cards

Interquartile Range (IQR)

represents the spread of the middle 50% of data, calculated as IQR = Q3 - Q1

New cards

Law of Large Numbers

states that as the number of independent, random trials ($n$) increases, the experimental (sample) mean or proportion converges to the theoretical (population) mean or probability

New cards

Least Squares Regression Line (LSRL)

e unique "line of best fit" for a scatterplot, formulated as y^ = a + bx to minimize the sum of squared vertical residuals,

New cards

Lurking Variable

an unmeasured, hidden variable not included as an explanatory or response variable in a study, yet it influences both, creating a false impression of causation or masking the true relationship

New cards

Margin of Error

maximum likely distance between a sample statistic and the true population parameter, acting as the "radius" of a confidence interval

<p><span>maximum likely distance between a sample statistic </span>and the true population parameter, acting as the "radius" of a confidence interval</p>

New cards

Matched Pairs Design

a special type of randomized block design in AP Statistics, used for comparing two treatments. It involves creating pairs of subjects that are as similar as possible (or using the same subject for both treatments) to reduce variability, then randomly assigning the treatments within each pair

New cards

Mean of a Binomial Random Variable

s calculated by multiplying the number of trials (n) by the probability of success (p) in each trial

New cards

Mean of a Discrete Random Variable

the long-run average outcome over many repetitions, calculated as the weighted average of all possible values

New cards

Mean of a Geometric Random Variable

the expected number of trials needed to achieve the first success

New cards

Multiplication Rule for Independent Events

If two events, A and B, are independent, the probability of both occurring is (P(A and B) = P(A) x P(B)

New cards

Multiplication Rule for Dependent Events

calculates the probability that both events occur when the outcome of the first event affects the second

New cards

Nonresistant Measure

summary statistics—specifically the mean, standard deviation, and range—that are highly sensitive to outliers and skewness.

New cards

Nonresponse

occurs when individuals selected for a sample fail to respond or cannot be reached, and these individuals differ systematically from those who do respond

New cards

Normal Distribution

symmetric, bell-shaped density curves defined by a mean and standard deviation. They are used to model continuous data, with total area under the curve equaling 1

<p>symmetric, bell-shaped density curves defined by a mean and standard deviation. They are used to model continuous data, with total area under the curve equaling 1</p>

New cards

Observational Study

collecting data by observing subjects or measuring variables without manipulating the environment or imposing treatments

New cards

Outlier

a data point that differs significantly from the overall pattern of a dataset

New cards

Percentile

represents the percentage of data values that are less than or equal to a specific value

New cards

Placebo Effect

occurs when experimental units show a real response to an inactive treatment (placebo) simply because they believe they are receiving active treatment

New cards

Power

It is the probability of a Type II error). It represents the ability of a test to detect an effect or difference when one actually exists, typically aiming for 0.80 or higher

New cards

Principles of Experimental Design

Control, Randomization, Replication, and Comparison

New cards

Probability Distribution

describes all possible values a random variable can take and their associated probabilities. It defines the likelihood of outcomes in a sample space, covering discrete (countable) and continuous (interval) variables.

New cards

P-Value

measures the probability, assuming the null hypothesis (Ho) is true, of obtaining a test statistic as extreme or more extreme than the observed result

New cards

Random Variable

is a numerical variable whose value is determined by the outcome of a chance process. It assigns a numerical value to each outcome in a sample space, falling into two main types: discrete or continuous

New cards

Randomized Block Design

An experimental design used to reduce variability by grouping similar experimental units (blocks) based on a variable suspected to influence the response, then randomly assigning treatments within each block

New cards

Replication

a foundational principle of experimental design, requiring the application of treatments to multiple experimental units (subjects) to reduce chance variation and identify true patterns

New cards

Residual

The vertical distance between an observed data point and the least-squares regression line

New cards

Residual Plot

a scatterplot of the residuals (actual y - predicted y) on the vertical axis and the explanatory variable (x) on the horizontal axis

New cards

Response Bias

occurs when survey design, question wording, or interviewer behavior causes respondents to give inaccurate, untruthful, or misleading answers, leading to systematic errors

New cards

Response Variable

the outcome, dependent, or "Y" variable (y) that is measured or predicted in a study or experiment

New cards

Significance Level

The threshold set before a study to determine if a result is statistically significant

New cards

Simple Random Sample (SRS)

a sampling method where every possible group of n individuals in a population has an equal chance of being selected.

New cards

Skewed Distribution

an asymmetrical data distribution where one tail is longer or fatter than the other, indicating data points are not evenly spread around the mean

<p>an asymmetrical data distribution where one tail is longer or fatter than the other, indicating data points are not evenly spread around the mean</p>

New cards

standard deviation

measures the average distance of data points from the mean, quantifying spread in a dataset

New cards

Stratified Random Sampling

a sampling method where the population is divided into distinct, non-overlapping subgroups called strata (based on similar characteristics), and then a separate simple random sample (SRS) is taken from each stratum

New cards

Type I Error

when you reject a true null hypothesis (Ho), often called a "false positive".

New cards

Type II Error

hen you fail to reject a null hypothesis (Ho) that is actually false

New cards

Undercoverage

occurs when certain segments of a population are systematically left out or underrepresented in the sampling frame

New cards

Variability of a Statistic

defined by the spread of its sampling distribution, describes how much a statistic varies from sample to sample

New cards

z-score

measures how many standard deviations a data point ($x$) is above or below the mean

$<p>measures how many standard deviations a data point ($x$) is above or below the mean</p>$