Last Minute AP Statistics Cheat Sheet (WITH FORMULAS)

What You Need to Know

This is the high-yield formula + procedure sheet you use when you’re trying to (1) pick the right method fast, (2) check conditions correctly, and (3) write the minimum-necessary but full-credit inference “story” (parameter → conditions → compute → conclude in context).

Big AP Stats idea: almost every FRQ is either describing data, probability/random variables, or inference (confidence interval or significance test). The fastest way to lose points is skipping conditions or failing to define the parameter.

Golden rule: Your hypotheses and conclusion must be about a population parameter (like p, \mu, \mu_1-\mu_2, \beta), not about sample statistics (like \hat p, \bar x).


Step-by-Step Breakdown

A. Picking the right inference procedure (fast decision tree)

  1. Is your response variable categorical (yes/no) or quantitative (number)?
    • Categorical → proportions, chi-square.
    • Quantitative → means, t-procedures, regression.
  2. How many groups/samples?
    • 1 sample → one-proportion z or one-mean t.
    • 2 independent samples → two-proportion z or two-sample t.
    • Matched pairs → one-sample t on differences.
  3. Are you comparing distributions of categories across groups?
    • One categorical variable vs a claimed model → chi-square GOF.
    • Two categorical variables (relationship) → chi-square independence.
    • Several populations/treatments and one categorical response → chi-square homogeneity.
  4. Is it a relationship between two quantitative variables?
    • Use linear regression; inference about slope \beta uses a t test/interval with df=n-2.

B. Writing any inference solution (full-credit skeleton)

  1. Define the parameter (in context).
    • Example: p = true proportion of all students at your school who…
  2. State hypotheses (test only).
    • H_0: p=p_0, H_a: p\ne p_0 (or
  3. Check conditions (name + verify with given info).
    • Random, 10% condition, Normal/Large Counts, etc.
  4. Compute the test statistic or interval (show formula + plug values).
  5. P-value OR critical value method (usually P-value on AP).
  6. Conclude in context at level \alpha.
    • “Because p\text{-value} < \alpha, reject H_0. There is convincing evidence that …”

C. Mini worked walkthrough (one-proportion z test)

Prompt style: “Is there evidence the true proportion differs from 0.40?”

  1. Parameter: p = true proportion of (population) who …
  2. Hypotheses: H_0: p=0.40, H_a: p\ne 0.40
  3. Conditions:
    • Random: stated random sample/assignment.
    • 10%: n \le 0.1N (if sampling without replacement).
    • Large counts: np_0\ge 10 and n(1-p_0)\ge 10.
  4. Compute:
    • \hat p = \dfrac{x}{n}
    • z=\dfrac{\hat p - p_0}{\sqrt{\dfrac{p_0(1-p_0)}{n}}}
  5. Get P-value from Normal.
  6. Conclude in context.

Key Formulas, Rules & Facts

A. Describing data (quick hits)

ToolFormulaWhen to useNotes
Standard scorez=\dfrac{x-\mu}{\sigma} or z=\dfrac{x-\bar x}{s}Compare to distribution center/spread“How many SDs from mean?”
Outlier rulebelow Q_1-1.5(IQR) or above Q_3+1.5(IQR)Boxplots/outliersNot “proof,” just a flag
Density/probabilityarea under curveContinuous modelsProbability = area

B. Linear transformations & combining variables

RuleFormulaNotes
Add constant a\mu_{X+a}=\mu_X+a, \sigma_{X+a}=\sigma_XShifts center only
Multiply by b\mu_{bX}=b\mu_X, \sigma_{bX}=|b|\sigma_XStretch/compress spread
Sum (any)\mu_{X+Y}=\mu_X+\mu_YAlways true
Sum (independent)\sigma^2_{X+Y}=\sigma_X^2+\sigma_Y^2Variances add, not SDs
Difference (independent)\sigma^2_{X-Y}=\sigma_X^2+\sigma_Y^2Still add variances

C. Probability essentials

RuleFormulaUseNotes
ComplementP(A^c)=1-P(A)“At least one”Often fastest
Addition ruleP(A\cup B)=P(A)+P(B)-P(A\cap B)Two eventsIf disjoint, intersection =0
ConditionalP(A\mid B)=\dfrac{P(A\cap B)}{P(B)}Given infoRestrict sample space
IndependenceP(A\cap B)=P(A)P(B)Check independenceEquivalent to P(A\mid B)=P(A)
BayesP(A\mid B)=\dfrac{P(B\mid A)P(A)}{P(B)}Reverse conditionTree diagrams help

D. Discrete random variables (AP favorites)

ModelProbabilityMeanSDConditions/Notes
Binomial X\sim Bin(n,p)P(X=k)=\binom{n}{k}p^k(1-p)^{n-k}\mu=np\sigma=\sqrt{np(1-p)}BINS: Binary, Independent, Number fixed, Same p
Geometric X\sim Geom(p)P(X=k)=(1-p)^{k-1}p\mu=\dfrac{1}{p}\sigma=\sqrt{\dfrac{1-p}{p^2}}Counts trials until first success
Expected value\mu_X=E(X)=\sum x\,P(x)Any discrete RVUse for “long-run average”

E. Normal + sampling distributions

IdeaFormulaWhen it appliesNotes
Normal modelX\sim N(\mu,\sigma)Given approx NormalStandardize to use Normal CDF
Sample mean\mu_{\bar x}=\mu, \sigma_{\bar x}=\dfrac{\sigma}{\sqrt{n}}SRS; Normal pop or large nCLT: large n makes \bar x approx Normal
Sample proportion\mu_{\hat p}=p, \sigma_{\hat p}=\sqrt{\dfrac{p(1-p)}{n}}Large countsFor inference, check large counts
Large counts (one prop)np\ge 10 and n(1-p)\ge 10Normal approx for \hat pFor tests use p_0 in check
Large counts (two prop)n_1p_1\ge 10, n_1(1-p_1)\ge 10, n_2p_2\ge 10, n_2(1-p_2)\ge 10Two-prop intervalsFor tests often use pooled \hat p

F. Confidence intervals (CI) and test statistics (most-used)

One proportion
TaskFormulaNotes
CI for p\hat p \pm z^*\sqrt{\dfrac{\hat p(1-\hat p)}{n}}Use \hat p in SE
Test for pz=\dfrac{\hat p-p_0}{\sqrt{\dfrac{p_0(1-p_0)}{n}}}Use p_0 in SE
Two proportions (independent)
TaskFormulaNotes
CI for p_1-p_2(\hat p_1-\hat p_2) \pm z^*\sqrt{\dfrac{\hat p_1(1-\hat p_1)}{n_1}+\dfrac{\hat p_2(1-\hat p_2)}{n_2}}Don’t pool for CI
Test for p_1-p_2z=\dfrac{(\hat p_1-\hat p_2)-0}{\sqrt{\hat p(1-\hat p)\left(\dfrac{1}{n_1}+\dfrac{1}{n_2}\right)}} where \hat p=\dfrac{x_1+x_2}{n_1+n_2}Pool only in the test under H_0: p_1=p_2
One mean (quantitative)
TaskFormulaNotes
CI for \mu\bar x \pm t^*\dfrac{s}{\sqrt{n}} with df=n-1Use when \sigma unknown (usual)
Test for \mut=\dfrac{\bar x-\mu_0}{s/\sqrt{n}} with df=n-1Check approx Normal / no strong skew+outliers
Two means (independent samples)
TaskFormulaNotes
CI for \mu_1-\mu_2(\bar x_1-\bar x_2) \pm t^*\sqrt{\dfrac{s_1^2}{n_1}+\dfrac{s_2^2}{n_2}}Calculator uses df approximation
Test for \mu_1-\mu_2t=\dfrac{(\bar x_1-\bar x_2)-0}{\sqrt{\dfrac{s_1^2}{n_1}+\dfrac{s_2^2}{n_2}}}Don’t pool SDs in AP Stats
Matched pairs (paired data)
  • Compute differences d_i = x_{1i}-x_{2i}.
  • Then do one-sample t on differences:
    • \bar d \pm t^*\dfrac{s_d}{\sqrt{n}} and t=\dfrac{\bar d-\mu_{d,0}}{s_d/\sqrt{n}} with df=n-1.

G. Chi-square procedures

ProcedureStatisticdfConditionsNotes
GOF\chi^2=\sum \dfrac{(O-E)^2}{E}k-1Random; expected counts typically \ge 5E=n\times p_{model}
Independence/Homogeneity\chi^2=\sum \dfrac{(O-E)^2}{E}(r-1)(c-1)Random; expected counts typically \ge 5E=\dfrac{(row\ total)(col\ total)}{n}

H. Regression (least squares + inference)

QuantityFormulaNotes
LSRL\hat y=a+bxPredict y from x
Slopeb=r\dfrac{s_y}{s_x}Sign matches r
Intercepta=\bar y-b\bar xLine goes through \left(\bar x,\bar y\right)
Residuale=y-\hat yPositive residual = point above line
Correlation-1\le r\le 1No units; linear strength only
Coef. of determinationr^2% variability in y explained by linear model with x
Slope testt=\dfrac{b-0}{SE_b}, df=n-2Test H_0: \beta=0
CI for slopeb\pm t^*SE_bInterpret change in mean response

Regression conditions (LINER): Linear pattern, Independent, Normal residuals, Equal variance, Random.

I. Inference vocabulary (quick definitions)

  • P-value: probability (assuming H_0 true) of getting a statistic as extreme or more extreme than observed.
  • Type I error: reject true H_0 (false positive). Probability =\alpha.
  • Type II error: fail to reject false H_0 (false negative). Probability =\beta.
  • Power: 1-\beta.

Examples & Applications

Example 1: Two-proportion z interval (wording trap)

Situation: Compare vaccination rates in School A vs School B.

  • Parameter: p_A-p_B = true difference in vaccination proportions.
  • Use CI:
    • \left(\hat p_A-\hat p_B\right) \pm z^*\sqrt{\dfrac{\hat p_A(1-\hat p_A)}{n_A}+\dfrac{\hat p_B(1-\hat p_B)}{n_B}}
      Key insight: If CI contains 0, a “difference” claim isn’t supported.

Example 2: Matched pairs vs two-sample (super common)

Situation: Same students take a pretest and posttest.

  • Don’t do two-sample t.
  • Compute d_i=post-pre, then one-sample t on \mu_d.
  • Test statistic: t=\dfrac{\bar d-0}{s_d/\sqrt{n}}.
    Key insight: Pairing reduces variability; ignoring pairing can hide effects.

Example 3: Chi-square independence (interpretation)

Situation: Is seat location (front/middle/back) related to passing (yes/no)?

  • Parameter: whether the two categorical variables are independent in the population.
  • Expected count: E=\dfrac{(row\ total)(col\ total)}{n}.
  • Statistic: \chi^2=\sum \dfrac{(O-E)^2}{E}, df=(r-1)(c-1).
    Key insight: A significant result says “associated,” not “causes.”

Example 4: Regression slope inference (what you conclude)

Situation: Predict exam score from hours studied.

  • Test H_0: \beta=0 vs H_a: \beta>0.
  • Compute t=\dfrac{b}{SE_b} with df=n-2.
  • Conclusion in context: “There is convincing evidence of a positive linear relationship between hours studied and mean exam score.”
    Key insight: You’re making a claim about mean response changing with x, not about individual predictions being perfect.

Common Mistakes & Traps

  1. Mistake: Hypotheses about \hat p or \bar x instead of p or \mu.

    • Why wrong: sample stats are random; parameters are fixed truths.
    • Fix: define parameter first, then write H_0 and H_a about it.
  2. Mistake: Using t vs z incorrectly.

    • Why wrong: means with unknown \sigma require t; proportions use z.
    • Fix: quantitative → t, categorical → z.
  3. Mistake: Pooling in a two-proportion CI.

    • Why wrong: pooling assumes p_1=p_2, which is exactly what you’re estimating in a CI.
    • Fix: Pool only for the hypothesis test of p_1-p_2=0.
  4. Mistake: Skipping or mis-checking large counts.

    • Why wrong: Normal approximation can fail badly with small expected successes/failures.
    • Fix: For one-prop tests use p_0; for intervals use \hat p.
  5. Mistake: Treating matched pairs as independent samples.

    • Why wrong: within-person pairing creates dependence; you must analyze differences.
    • Fix: If the same subject is measured twice (or paired units), do one-sample t on d.
  6. Mistake: Wrong chi-square df / expected counts.

    • Why wrong: df controls the reference distribution; wrong df → wrong P-value.
    • Fix: GOF df=k-1; two-way tables df=(r-1)(c-1); compute E using row/col totals.
  7. Mistake: Regression conclusion implies causation.

    • Why wrong: observational studies can have confounding.
    • Fix: Only randomized experiments justify cause-and-effect.
  8. Mistake: “No significance” = “proved equal.”

    • Why wrong: failing to reject H_0 means insufficient evidence, not proof.
    • Fix: say “not enough evidence to conclude…”

Memory Aids & Quick Tricks

Trick / mnemonicHelps you rememberWhen to use
SOCSShape, Outliers, Center, SpreadDescribing distributions fast
BINSBinomial conditions: Binary, Independent, Number fixed, Same pDecide binomial vs not
10% conditionIndependence when sampling without replacementAny sampling inference
PLANParameter, Label (hypotheses), Assumptions/conditions, Name test/intervalAny inference FRQ write-up
“Pool for test, not for CI”Two-proportion pooling ruleTwo-proportion inference
LINERLinear, Independent, Normal residuals, Equal variance, RandomRegression inference
“df = n-1, n-2, (r-1)(c-1)”df for one-sample t, regression slope, chi-square tableDon’t lose df points
CUSSChi-square: Counts, Use expected, Sum \dfrac{(O-E)^2}{E}, Shape is right-skewChi-square setup + interpretation

Quick Review Checklist

  • [ ] You defined the parameter (with population + context).
  • [ ] Your H_0 and H_a are about the parameter, and direction matches the prompt.
  • [ ] You checked Random, 10%, and the correct Normal/Large Counts condition.
  • [ ] Proportions: z procedures; Means: t procedures; Paired: analyze differences.
  • [ ] Two-prop test uses pooled \hat p; two-prop CI does not.
  • [ ] You used the correct df: n-1 (one-sample/paired t), n-2 (regression slope), (r-1)(c-1) (chi-square table).
  • [ ] Your conclusion is in context and matches the decision: reject vs fail to reject.
  • [ ] You didn’t claim causation unless it was a randomized experiment.

You’ve got this—run the checklist on every inference question and you’ll avoid the biggest point leaks.