Least Squares Regression Line (LSRL) Notes
Least Squares Regression Line (LSRL)
Direction
- Positive: As x increases, y increases.
- Negative: As x increases, y decreases.
Outlier(s)
- Identify any points that do not follow the overall trend.
- Review 02.02 page 6 of 8.
- Does the association follow a linear trend?
- Typical forms are linear, non-linear, or random scatter (no association).
Strength
- How closely do the points follow a visible form?
- If the points are close to where a model can be drawn, the strength is strong.
- If linear, report r (correlation coefficient).
Common Mistakes
- Correlation does NOT imply causation.
- When describing the association in a scatterplot, discuss direction, form, strength, and unusual features in the context of the problem, using variable names.
- Transforming an association to achieve linearity depends on the original association.
- Two types of associations: Power (x^n) and Exponential (a^x).
- Power: Graph log(x) vs. log(y).
- Exponential: Graph x vs. log(y).
Calculator Steps (TI-83/84)
- Enter explanatory variable (x) into L1 and response variable (y) into L2.
- For Power, find log(L1) and store in L3; find log(L2) and store in L4.
- Check the new x vs. y in a scatterplot.
- Check the residual plot for no visible pattern.
- Check that r-squared is closer to 1.
Correlation
- Always show your work.
- Round to four decimal places.
- Include units for both x- and y-variables.
- Use "predicted or estimated" when interpreting slope and y-intercept.
- When estimating slope, say "for each additional" or "for every one unit increase in [x in context]".
- Define your variables (x and y-hat) with context (what they stand for).
- Write answers in the context of the problem when interpreting slope, y-intercept, r, r2, residuals, etc.
- The sign of the residual is opposite to what one would believe. A negative residual is an overestimate, and a positive residual is an underestimate.
- Always report four decimal places when possible.
Explanatory and Response Variables
- Explanatory variable: variable used to explain or predict changes in other variable values; also known as the independent variable (x).
- Response Variable: variable that measures the outcome (prediction) in response to the explanatory variable; also known as the dependent variable (y).
Calculator Steps for TI-83/84
- Go to Stats, Edit
- Enter explanatory variable (x) in L1 and response variable (y) in L2.
- Then, go to: Stats, Calc, and Linreg(a + bx)
Slope and Y-Intercept Interpretation
- Slope interpretation: On average, for each additional [x in context], the predicted [y in context] changes by [a units].
- Y-intercept interpretation: When [x in context] is zero, the predicted [y in context] is [b units].
- To receive full credit, define the x and y variables in your LSRL.
Residual Values
- Residual = Actual - Predicted value.
- To interpret: The residual represents how much our model either over/underestimated the actual value to be.
- Tip: Always show all work when calculating a residual and include units.
- Be careful, a positive residual is an UNDERestimate and a negative residual is an OVERestimate.
Residual Plots
- Used to determine whether current linear model is appropriate.
- The x-axis usually plots the x-variable, and the y-axis is usually the residuals.
- Random Scatter is GOOD! It means that the current linear model is appropriate.
- Visible pattern is BAD! It means that another model could be better.
R-Squared: Coefficient of Determination
- Helps determine whether a linear model is appropriate after checking that the residual plot shows no visible pattern.
- The closer to 1 r-squared is, the more appropriate the linear model.
- To interpret: r-squared is the percent of variation in [y] that can be accounted for by the LSRL relating [y in context] to [x in context].
- When reading computer output, we NEVER report r-squared adj (adjusted).
Correlation Coefficient (r)
- Measures both direction (+/-) and strength (closer to –1 or 1 stronger, closer to 0 weaker).
- Correlation does NOT imply causation!
- Slope: b = r * (Sy / Sx)
- y - intercept: a = \overline{y} - b\overline{x}
Important Notes on Correlation
- Correlation is a measure of association, not causation.
- Correlation is only appropriate to use to describe the strength and direction for linear relationships.
- Correlation does not measure form.
- Correlation is not a resistant measure of strength (similar to mean and standard deviation).
- Correlation has no unit of measurement and requires that both variables be quantitative; makes no distinction between explanatory and response variables.
Additional Considerations
- Don’t make predictions using values of x that are much larger or much smaller than those that actually appear in your data (known as extrapolation).
- When asked to interpret the slope or y intercept, include the word predicted in your response.
- Slope is changes in y over changes in x. (Sy/Sx) can be found on the AP Statistics Formula Sheet.