1/24
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Regression line
A mathematical model that describes how a response variable (y) tends to change as an explanatory variable (x) changes; summarizes an overall linear trend, not a perfect rule for every point.
Explanatory variable (x)
The quantitative variable used to explain or predict changes in another variable; the input of a regression model.
Response variable (y)
The quantitative variable being predicted or explained by x; the output of a regression model.
Least-squares regression line (LSRL)
The regression line that minimizes the sum of squared residuals; written as ŷ = a + bx.
Predicted value (ŷ, “y-hat”)
The value of y predicted by the regression equation for a given x.
Least squares
A fitting method that chooses the slope and intercept to make the overall vertical prediction errors as small as possible by minimizing the sum of squared residuals.
Residual (e)
The vertical prediction error for a point: e = y − ŷ.
Sum of squared residuals
The quantity minimized by the LSRL: Σ(y − ŷ)²; squaring prevents cancellation and penalizes large errors.
Slope (b) of the LSRL
The change in predicted y for a 1-unit increase in x; computed by b = r(sy/sx).
Intercept (a) of the LSRL
The predicted value of y when x = 0; computed by a = ȳ − b x̄; meaningful only if x = 0 is in a reasonable data range.
Correlation (r)
A measure of the direction and strength of linear association between x and y; its sign matches the sign of the regression slope.
Standard deviation of x (sx)
A measure of the spread of the explanatory variable x; used in the slope formula b = r(sy/sx).
Standard deviation of y (sy)
A measure of the spread of the response variable y; used in the slope formula b = r(sy/sx).
Point (x̄, ȳ)
The mean point of the data; the LSRL always passes through (x̄, ȳ).
Horizontal LSRL when r = 0
If r = 0, then b = 0 and the regression line is ŷ = ȳ (a horizontal line at the mean of y).
Slope interpretation
For each 1-unit increase in x, the predicted y changes by b units, on average, in the context of the model (with units).
Intercept interpretation
When x = 0, the predicted value of y is a; can be misleading if x = 0 is outside the observed x-range (extrapolation issue).
Coefficient of determination (r²)
The proportion of variability in y explained by the linear regression of y on x (e.g., r² = 0.64 means about 64% explained).
Residual plot
A graph of residuals versus x (or versus ŷ), showing points (x, e); used to assess whether a linear model is appropriate.
Random scatter around 0 (in a residual plot)
A desirable pattern indicating the linear model captures the main trend and leftover variation looks like random noise.
Nonlinearity (curvature)
A departure from linearity where residuals show a curved pattern (e.g., positive then negative then positive), suggesting a straight-line model is missing structure.
Nonconstant variance (changing spread)
A pattern where residuals “fan out” or “fan in,” indicating the variability of y changes across x and prediction reliability may vary by x.
Standard deviation of the residuals (s)
A measure of typical prediction error in y-units: s = sqrt(Σe²/(n−2)); describes typical distance of observed y values from the regression line.
High leverage point
A point with an extreme x-value compared to the rest of the data; can strongly pull the regression line because it is far out in the x-direction.
Influential point
A point that noticeably changes the regression line (slope and/or intercept) if removed; high leverage points are most likely to be influential, but not always.