Last Minute AP Statistics Cheat Sheet (WITH FORMULAS)

What You Need to Know

This is the high-yield formula + procedure sheet you use when you’re trying to (1) pick the right method fast, (2) check conditions correctly, and (3) write the minimum-necessary but full-credit inference “story” (parameter → conditions → compute → conclude in context).

Big AP Stats idea: almost every FRQ is either describing data, probability/random variables, or inference (confidence interval or significance test). The fastest way to lose points is skipping conditions or failing to define the parameter.

Golden rule: Your hypotheses and conclusion must be about a population parameter (like $p$ , $\mu$ , $\mu_1-\mu_2$ , $\beta$ ), not about sample statistics (like $\hat p$ , $\bar x$ ).

Step-by-Step Breakdown

A. Picking the right inference procedure (fast decision tree)

Is your response variable categorical (yes/no) or quantitative (number)?
- Categorical → proportions, chi-square.
- Quantitative → means, t-procedures, regression.
How many groups/samples?
- 1 sample → one-proportion $z$ or one-mean $t$ .
- 2 independent samples → two-proportion $z$ or two-sample $t$ .
- Matched pairs → one-sample $t$ on differences.
Are you comparing distributions of categories across groups?
- One categorical variable vs a claimed model → chi-square GOF.
- Two categorical variables (relationship) → chi-square independence.
- Several populations/treatments and one categorical response → chi-square homogeneity.
Is it a relationship between two quantitative variables?
- Use linear regression; inference about slope $\beta$ uses a $t$ test/interval with $df=n-2$ .

B. Writing any inference solution (full-credit skeleton)

Define the parameter (in context).
- Example: $p$ = true proportion of all students at your school who…
State hypotheses (test only).
- $H_0: p=p_0$ , $H_a: p\ne p_0$ (or $<$ , $>$ )
Check conditions (name + verify with given info).
- Random, 10% condition, Normal/Large Counts, etc.
Compute the test statistic or interval (show formula + plug values).
P-value OR critical value method (usually P-value on AP).
Conclude in context at level $\alpha$ .
- “Because $p\text{-value} < \alpha$ , reject $H_0$ . There is convincing evidence that …”

C. Mini worked walkthrough (one-proportion $z$ test)

Prompt style: “Is there evidence the true proportion differs from $0.40$ ?”

Parameter: $p$ = true proportion of (population) who …
Hypotheses: $H_0: p=0.40$ , $H_a: p\ne 0.40$
Conditions:
- Random: stated random sample/assignment.
- 10%: $n \le 0.1N$ (if sampling without replacement).
- Large counts: $np_0\ge 10$ and $n(1-p_0)\ge 10$ .
Compute:
- $\hat p = \dfrac{x}{n}$
- $z=\dfrac{\hat p - p_0}{\sqrt{\dfrac{p_0(1-p_0)}{n}}}$
Get P-value from Normal.
Conclude in context.

Key Formulas, Rules & Facts

A. Describing data (quick hits)

Tool	Formula	When to use	Notes
Standard score	$z=\dfrac{x-\mu}{\sigma}$ or $z=\dfrac{x-\bar x}{s}$	Compare to distribution center/spread	“How many SDs from mean?”
Outlier rule	below $Q_1-1.5(IQR)$ or above $Q_3+1.5(IQR)$	Boxplots/outliers	Not “proof,” just a flag
Density/probability	area under curve	Continuous models	Probability = area

B. Linear transformations & combining variables

Rule	Formula	Notes
Add constant $a$	$\mu_{X+a}=\mu_X+a$ , $\sigma_{X+a}=\sigma_X$	Shifts center only
Multiply by $b$	$\mu_{bX}=b\mu_X$ , $\sigma_{bX}=\|b\|\sigma_X$	Stretch/compress spread
Sum (any)	$\mu_{X+Y}=\mu_X+\mu_Y$	Always true
Sum (independent)	$\sigma^2_{X+Y}=\sigma_X^2+\sigma_Y^2$	Variances add, not SDs
Difference (independent)	$\sigma^2_{X-Y}=\sigma_X^2+\sigma_Y^2$	Still add variances

C. Probability essentials

Rule	Formula	Use	Notes
Complement	$P(A^c)=1-P(A)$	“At least one”	Often fastest
Addition rule	$P(A\cup B)=P(A)+P(B)-P(A\cap B)$	Two events	If disjoint, intersection $=0$
Conditional	$P(A\mid B)=\dfrac{P(A\cap B)}{P(B)}$	Given info	Restrict sample space
Independence	$P(A\cap B)=P(A)P(B)$	Check independence	Equivalent to $P(A\mid B)=P(A)$
Bayes	$P(A\mid B)=\dfrac{P(B\mid A)P(A)}{P(B)}$	Reverse condition	Tree diagrams help

D. Discrete random variables (AP favorites)

Model	Probability	Mean	SD	Conditions/Notes
Binomial $X\sim Bin(n,p)$	$P(X=k)=\binom{n}{k}p^k(1-p)^{n-k}$	$\mu=np$	$\sigma=\sqrt{np(1-p)}$	BINS: Binary, Independent, Number fixed, Same $p$
Geometric $X\sim Geom(p)$	$P(X=k)=(1-p)^{k-1}p$	$\mu=\dfrac{1}{p}$	$\sigma=\sqrt{\dfrac{1-p}{p^2}}$	Counts trials until first success
Expected value	$\mu_X=E(X)=\sum x\,P(x)$	Any discrete RV	Use for “long-run average”

E. Normal + sampling distributions

Idea	Formula	When it applies	Notes
Normal model	$X\sim N(\mu,\sigma)$	Given approx Normal	Standardize to use Normal CDF
Sample mean	$\mu_{\bar x}=\mu$ , $\sigma_{\bar x}=\dfrac{\sigma}{\sqrt{n}}$	SRS; Normal pop or large $n$	CLT: large $n$ makes $\bar x$ approx Normal
Sample proportion	$\mu_{\hat p}=p$ , $\sigma_{\hat p}=\sqrt{\dfrac{p(1-p)}{n}}$	Large counts	For inference, check large counts
Large counts (one prop)	$np\ge 10$ and $n(1-p)\ge 10$	Normal approx for $\hat p$	For tests use $p_0$ in check
Large counts (two prop)	$n_1p_1\ge 10$ , $n_1(1-p_1)\ge 10$ , $n_2p_2\ge 10$ , $n_2(1-p_2)\ge 10$	Two-prop intervals	For tests often use pooled $\hat p$

F. Confidence intervals (CI) and test statistics (most-used)

One proportion

Task	Formula	Notes
CI for $p$	$\hat p \pm z^*\sqrt{\dfrac{\hat p(1-\hat p)}{n}}$	Use $\hat p$ in SE
Test for $p$	$z=\dfrac{\hat p-p_0}{\sqrt{\dfrac{p_0(1-p_0)}{n}}}$	Use $p_0$ in SE

Two proportions (independent)

Task	Formula	Notes
CI for $p_1-p_2$	$(\hat p_1-\hat p_2) \pm z^*\sqrt{\dfrac{\hat p_1(1-\hat p_1)}{n_1}+\dfrac{\hat p_2(1-\hat p_2)}{n_2}}$	Don’t pool for CI
Test for $p_1-p_2$	$z=\dfrac{(\hat p_1-\hat p_2)-0}{\sqrt{\hat p(1-\hat p)\left(\dfrac{1}{n_1}+\dfrac{1}{n_2}\right)}}$ where $\hat p=\dfrac{x_1+x_2}{n_1+n_2}$	Pool only in the test under $H_0: p_1=p_2$

One mean (quantitative)

Task	Formula	Notes
CI for $\mu$	$\bar x \pm t^*\dfrac{s}{\sqrt{n}}$ with $df=n-1$	Use when $\sigma$ unknown (usual)
Test for $\mu$	$t=\dfrac{\bar x-\mu_0}{s/\sqrt{n}}$ with $df=n-1$	Check approx Normal / no strong skew+outliers

Two means (independent samples)

Task	Formula	Notes
CI for $\mu_1-\mu_2$	$(\bar x_1-\bar x_2) \pm t^*\sqrt{\dfrac{s_1^2}{n_1}+\dfrac{s_2^2}{n_2}}$	Calculator uses df approximation
Test for $\mu_1-\mu_2$	$t=\dfrac{(\bar x_1-\bar x_2)-0}{\sqrt{\dfrac{s_1^2}{n_1}+\dfrac{s_2^2}{n_2}}}$	Don’t pool SDs in AP Stats

Matched pairs (paired data)

Compute differences $d_i = x_{1i}-x_{2i}$ .
Then do one-sample $t$ on differences:
- $\bar d \pm t^*\dfrac{s_d}{\sqrt{n}}$ and $t=\dfrac{\bar d-\mu_{d,0}}{s_d/\sqrt{n}}$ with $df=n-1$ .

G. Chi-square procedures

Procedure	Statistic	df	Conditions	Notes
GOF	$\chi^2=\sum \dfrac{(O-E)^2}{E}$	$k-1$	Random; expected counts typically $\ge 5$	$E=n\times p_{model}$
Independence/Homogeneity	$\chi^2=\sum \dfrac{(O-E)^2}{E}$	$(r-1)(c-1)$	Random; expected counts typically $\ge 5$	$E=\dfrac{(row\ total)(col\ total)}{n}$

H. Regression (least squares + inference)

Quantity	Formula	Notes
LSRL	$\hat y=a+bx$	Predict $y$ from $x$
Slope	$b=r\dfrac{s_y}{s_x}$	Sign matches $r$
Intercept	$a=\bar y-b\bar x$	Line goes through $\left(\bar x,\bar y\right)$
Residual	$e=y-\hat y$	Positive residual = point above line
Correlation	$-1\le r\le 1$	No units; linear strength only
Coef. of determination	$r^2$	% variability in $y$ explained by linear model with $x$
Slope test	$t=\dfrac{b-0}{SE_b}$ , $df=n-2$	Test $H_0: \beta=0$
CI for slope	$b\pm t^*SE_b$	Interpret change in mean response

Regression conditions (LINER): Linear pattern, Independent, Normal residuals, Equal variance, Random.

I. Inference vocabulary (quick definitions)

P-value: probability (assuming $H_0$ true) of getting a statistic as extreme or more extreme than observed.
Type I error: reject true $H_0$ (false positive). Probability $=\alpha$ .
Type II error: fail to reject false $H_0$ (false negative). Probability $=\beta$ .
Power: $1-\beta$ .

Examples & Applications

Example 1: Two-proportion $z$ interval (wording trap)

Situation: Compare vaccination rates in School A vs School B.

Parameter: $p_A-p_B$ = true difference in vaccination proportions.
Use CI:
- $\left(\hat p_A-\hat p_B\right) \pm z^*\sqrt{\dfrac{\hat p_A(1-\hat p_A)}{n_A}+\dfrac{\hat p_B(1-\hat p_B)}{n_B}}$
  Key insight: If CI contains $0$ , a “difference” claim isn’t supported.

Example 2: Matched pairs vs two-sample (super common)

Situation: Same students take a pretest and posttest.

Don’t do two-sample $t$ .
Compute $d_i=post-pre$ , then one-sample $t$ on $\mu_d$ .
Test statistic: $t=\dfrac{\bar d-0}{s_d/\sqrt{n}}$ .
Key insight: Pairing reduces variability; ignoring pairing can hide effects.

Example 3: Chi-square independence (interpretation)

Situation: Is seat location (front/middle/back) related to passing (yes/no)?

Parameter: whether the two categorical variables are independent in the population.
Expected count: $E=\dfrac{(row\ total)(col\ total)}{n}$ .
Statistic: $\chi^2=\sum \dfrac{(O-E)^2}{E}$ , $df=(r-1)(c-1)$ .
Key insight: A significant result says “associated,” not “causes.”

Example 4: Regression slope inference (what you conclude)

Situation: Predict exam score from hours studied.

Test $H_0: \beta=0$ vs $H_a: \beta>0$ .
Compute $t=\dfrac{b}{SE_b}$ with $df=n-2$ .
Conclusion in context: “There is convincing evidence of a positive linear relationship between hours studied and mean exam score.”
Key insight: You’re making a claim about mean response changing with $x$ , not about individual predictions being perfect.

Common Mistakes & Traps

Mistake: Hypotheses about $\hat p$ or $\bar x$ instead of $p$ or $\mu$ .
- Why wrong: sample stats are random; parameters are fixed truths.
- Fix: define parameter first, then write $H_0$ and $H_a$ about it.
Mistake: Using $t$ vs $z$ incorrectly.
- Why wrong: means with unknown $\sigma$ require $t$ ; proportions use $z$ .
- Fix: quantitative → $t$ , categorical → $z$ .
Mistake: Pooling in a two-proportion CI.
- Why wrong: pooling assumes $p_1=p_2$ , which is exactly what you’re estimating in a CI.
- Fix: Pool only for the hypothesis test of $p_1-p_2=0$ .
Mistake: Skipping or mis-checking large counts.
- Why wrong: Normal approximation can fail badly with small expected successes/failures.
- Fix: For one-prop tests use $p_0$ ; for intervals use $\hat p$ .
Mistake: Treating matched pairs as independent samples.
- Why wrong: within-person pairing creates dependence; you must analyze differences.
- Fix: If the same subject is measured twice (or paired units), do one-sample $t$ on $d$ .
Mistake: Wrong chi-square df / expected counts.
- Why wrong: df controls the reference distribution; wrong df → wrong P-value.
- Fix: GOF $df=k-1$ ; two-way tables $df=(r-1)(c-1)$ ; compute $E$ using row/col totals.
Mistake: Regression conclusion implies causation.
- Why wrong: observational studies can have confounding.
- Fix: Only randomized experiments justify cause-and-effect.
Mistake: “No significance” = “proved equal.”
- Why wrong: failing to reject $H_0$ means insufficient evidence, not proof.
- Fix: say “not enough evidence to conclude…”

Memory Aids & Quick Tricks

Trick / mnemonic	Helps you remember	When to use
SOCS	Shape, Outliers, Center, Spread	Describing distributions fast
BINS	Binomial conditions: Binary, Independent, Number fixed, Same $p$	Decide binomial vs not
10% condition	Independence when sampling without replacement	Any sampling inference
PLAN	Parameter, Label (hypotheses), Assumptions/conditions, Name test/interval	Any inference FRQ write-up
“Pool for test, not for CI”	Two-proportion pooling rule	Two-proportion inference
LINER	Linear, Independent, Normal residuals, Equal variance, Random	Regression inference
“df = n-1, n-2, (r-1)(c-1)”	df for one-sample $t$ , regression slope, chi-square table	Don’t lose df points
CUSS	Chi-square: Counts, Use expected, Sum $\dfrac{(O-E)^2}{E}$ , Shape is right-skew	Chi-square setup + interpretation

Quick Review Checklist

[ ] You defined the parameter (with population + context).
[ ] Your $H_0$ and $H_a$ are about the parameter, and direction matches the prompt.
[ ] You checked Random, 10%, and the correct Normal/Large Counts condition.
[ ] Proportions: $z$ procedures; Means: $t$ procedures; Paired: analyze differences.
[ ] Two-prop test uses pooled $\hat p$ ; two-prop CI does not.
[ ] You used the correct df: $n-1$ (one-sample/paired $t$ ), $n-2$ (regression slope), $(r-1)(c-1)$ (chi-square table).
[ ] Your conclusion is in context and matches the decision: reject vs fail to reject.
[ ] You didn’t claim causation unless it was a randomized experiment.

You’ve got this—run the checklist on every inference question and you’ll avoid the biggest point leaks.