1/24
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Inference for linear regression
Using sample regression results to draw conclusions about the population relationship between two quantitative variables, especially about the population slope.
Population slope (β1)
The true average change in the response variable y for a one-unit increase in the explanatory variable x in the population.
Sample slope (b1 or b)
The slope from the least-squares regression line computed from sample data; an estimate of the population slope β1.
Least-squares regression line
The line (ŷ = b0 + b1x) that minimizes the sum of squared residuals and is used to predict y from x.
Predicted value (ŷ)
The value of y predicted by the regression line for a given x.
Residual (e)
The vertical difference between an observed value and the predicted value: e = y − ŷ.
Simple linear regression model
A statistical model assuming y = β0 + β1x + ε, where ε is a random error term.
Error term (ε)
The random deviation from the true regression line in the model; assumed to have mean 0.
L.I.N.E. conditions
A checklist for regression inference: Linear, Independent, Normal (errors), and Equal variance of residuals.
Linearity condition
The relationship between x and y is approximately linear; checked with a scatterplot and residual plot (no curved pattern).
Independence condition
Observations are independent, typically justified by random sampling or random assignment (not proven by plots).
Normality (of errors) condition
For each fixed x, the errors (or y values) are approximately Normal; checked with residual histograms or Normal probability plots.
Equal variance condition
The spread of residuals is roughly constant across x; checked by looking for no “funnel shape” in the residual plot.
Degrees of freedom for slope inference (df = n − 2)
The t procedures for slope use n − 2 degrees of freedom because two parameters (β0 and β1) are estimated from the data.
t distribution (in regression)
The sampling distribution used for inference about β1 when conditions hold, with df = n − 2.
Standard error of the slope (SEb1)
Measures typical sample-to-sample variability in the estimated slope b1; smaller with less scatter, larger n, and more spread in x.
Critical value (t*)
The t cutoff used to build a confidence interval at a given confidence level and df = n − 2.
Confidence interval for the slope
An interval estimate for β1 computed as b1 ± t*SEb1, giving plausible values for the true population slope.
Interpretation of a slope confidence interval
A statement about the plausible average change in the mean (predicted) y per 1-unit increase in x in the population (not about individual outcomes).
Slope test statistic
The t value for testing β1: t = (b1 − β1,0) / SEb1, using a t distribution with df = n − 2.
Null hypothesis for slope (H0: β1 = 0)
A claim that the population slope is 0, meaning no linear association between x and y in the population.
Alternative hypothesis for slope (Ha)
A claim that the population slope differs from the null value (e.g., β1 ≠ 0, β1 > 0, or β1 < 0).
p-value (for slope test)
Assuming H0 is true, the probability of getting a slope estimate (or t) at least as extreme as observed, in the direction(s) of Ha.
Association vs. causation (in regression)
A significant slope supports evidence of a linear relationship, but causation is only justified when the data come from a randomized experiment (not merely an observational study).
Extrapolation
Using the regression model to make conclusions or predictions for x values far outside the observed data range; not justified even with good inference results.