AP Statistics Unit 2 Notes: Understanding Linear Regression for Two-Variable Data

0.0(0)

Studied by 22 people

0%Unit 2 Mastery

0%Exam Mastery

View linked note

Build your Mastery score

AP Practice

Supplemental Materials

Call Kai

Card Sorting

1/24

Earn XP

Description and Tags

AP Statistics

Unit 2: Exploring Two-Variable Data

Linear Regression

Last updated 3:08 PM on 3/12/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai	Chat

No analytics yet

Send a link to your students to track their progress

25 Terms

New cards

Regression line

A mathematical model that describes how a response variable (y) tends to change as an explanatory variable (x) changes; summarizes an overall linear trend, not a perfect rule for every point.

New cards

Explanatory variable (x)

The quantitative variable used to explain or predict changes in another variable; the input of a regression model.

New cards

Response variable (y)

The quantitative variable being predicted or explained by x; the output of a regression model.

New cards

Least-squares regression line (LSRL)

The regression line that minimizes the sum of squared residuals; written as ŷ = a + b $x$ .

New cards

Predicted value (ŷ, “y-hat”)

The value of y predicted by the regression equation for a given x.

New cards

Least squares

A fitting method that chooses the slope and intercept to make the overall vertical prediction errors as small as possible by minimizing the sum of squared residuals.

New cards

Residual (e)

The vertical prediction error for a point: $e = y - ŷ$ .

New cards

Sum of squared residuals

The quantity minimized by the LSRL: $\textstyle \sum (y - ŷ)^2$ ; squaring prevents cancellation and penalizes large errors.

New cards

Slope (b) of the LSRL

The change in predicted y for a 1-unit increase in x; computed by b = r(sy/sx).

New cards

Intercept (a) of the LSRL

The predicted value of y when x = 0; computed by a = ȳ − b x̄; meaningful only if x = 0 is in a reasonable data range.

New cards

Correlation (r)

A measure of the direction and strength of linear association between x and y; its sign matches the sign of the regression slope.

New cards

Standard deviation of x (s $x$ )

A measure of the spread of the explanatory variable x; used in the slope formula b = r(sy/sx).

New cards

Standard deviation of y (sy)

A measure of the spread of the response variable y; used in the slope formula b = r(sy/sx).

New cards

Point (x̄, ȳ)

The mean point of the data; the LSRL always passes through (x̄, ȳ).

New cards

Horizontal LSRL when r = 0

If $r = 0$ , then $b = 0$ and the regression line is $ŷ = ȳ$ (a horizontal line at the mean of y).

New cards

Slope interpretation

For each 1-unit increase in x, the predicted y changes by b units, on average, in the context of the model (with units).

New cards

Intercept interpretation

When x = 0, the predicted value of y is a; can be misleading if x = 0 is outside the observed x-range (extrapolation issue).

New cards

Coefficient of determination (r $^2$ )

The proportion of variability in y explained by the linear regression of y on x (e.g., r² = 0.64 means about 64% explained).

New cards

Residual plot

A graph of residuals versus x (or versus ŷ), showing points (x, e); used to assess whether a linear model is appropriate.

New cards

Random scatter around 0 (in a residual plot)

A desirable pattern indicating the linear model captures the main trend and leftover variation looks like random noise.

New cards

Nonlinearity (curvature)

A departure from linearity where residuals show a curved pattern (e.g., positive then negative then positive), suggesting a straight-line model is missing structure.

New cards

Nonconstant variance (changing spread)

A pattern where residuals “fan out” or “fan in,” indicating the variability of y changes across x and prediction reliability may vary by x.

New cards

Standard deviation of the residuals (s)

A measure of typical prediction error in y-units: s = \textstyle \sqrt\bigg(\frac{\textstyle \sum e^2}{n-2}\bigg); describes typical distance of observed y values from the regression line.

New cards

High leverage point

A point with an extreme x-value compared to the rest of the data; can strongly pull the regression line because it is far out in the x-direction.

New cards

Influential point

A point that noticeably changes the regression line (slope and/or intercept) if removed; high leverage points are most likely to be influential, but not always.