Least Squares Regression Line (LSRL) Notes

Least Squares Regression Line (LSRL)

Direction

  • Positive: As x increases, y increases.
  • Negative: As x increases, y decreases.

Outlier(s)

  • Identify any points that do not follow the overall trend.
  • Review 02.02 page 6 of 8.

Form

  • Does the association follow a linear trend?
  • Typical forms are linear, non-linear, or random scatter (no association).

Strength

  • How closely do the points follow a visible form?
  • If the points are close to where a model can be drawn, the strength is strong.
  • If linear, report r (correlation coefficient).

Common Mistakes

  • Correlation does NOT imply causation.
  • When describing the association in a scatterplot, discuss direction, form, strength, and unusual features in the context of the problem, using variable names.

Transformations

  • Transforming an association to achieve linearity depends on the original association.
  • Two types of associations: Power (x^n) and Exponential (a^x).
  • Power: Graph log(x) vs. log(y).
  • Exponential: Graph x vs. log(y).

Calculator Steps (TI-83/84)

  1. Enter explanatory variable (x) into L1 and response variable (y) into L2.
  2. For Power, find log(L1) and store in L3; find log(L2) and store in L4.

Checking Linearity After Transformation

  1. Check the new x vs. y in a scatterplot.
  2. Check the residual plot for no visible pattern.
  3. Check that r-squared is closer to 1.

Correlation

  • Always show your work.
  • Round to four decimal places.
  • Include units for both x- and y-variables.
  • Use "predicted or estimated" when interpreting slope and y-intercept.
  • When estimating slope, say "for each additional" or "for every one unit increase in [x in context]".
  • Define your variables (x and y-hat) with context (what they stand for).
  • Write answers in the context of the problem when interpreting slope, y-intercept, r, r2, residuals, etc.
  • The sign of the residual is opposite to what one would believe. A negative residual is an overestimate, and a positive residual is an underestimate.
  • Always report four decimal places when possible.

Explanatory and Response Variables

  • Explanatory variable: variable used to explain or predict changes in other variable values; also known as the independent variable (x).
  • Response Variable: variable that measures the outcome (prediction) in response to the explanatory variable; also known as the dependent variable (y).

Calculator Steps for TI-83/84

  1. Go to Stats, Edit
  2. Enter explanatory variable (x) in L1 and response variable (y) in L2.
  3. Then, go to: Stats, Calc, and Linreg(a + bx)

Slope and Y-Intercept Interpretation

  • Slope interpretation: On average, for each additional [x in context], the predicted [y in context] changes by [a units].
  • Y-intercept interpretation: When [x in context] is zero, the predicted [y in context] is [b units].
  • To receive full credit, define the x and y variables in your LSRL.

Residual Values

  • Residual = Actual - Predicted value.
  • To interpret: The residual represents how much our model either over/underestimated the actual value to be.
  • Tip: Always show all work when calculating a residual and include units.
  • Be careful, a positive residual is an UNDERestimate and a negative residual is an OVERestimate.

Residual Plots

  • Used to determine whether current linear model is appropriate.
  • The x-axis usually plots the x-variable, and the y-axis is usually the residuals.
  • Random Scatter is GOOD! It means that the current linear model is appropriate.
  • Visible pattern is BAD! It means that another model could be better.

R-Squared: Coefficient of Determination

  • Helps determine whether a linear model is appropriate after checking that the residual plot shows no visible pattern.
  • The closer to 1 r-squared is, the more appropriate the linear model.
  • To interpret: r-squared is the percent of variation in [y] that can be accounted for by the LSRL relating [y in context] to [x in context].
  • When reading computer output, we NEVER report r-squared adj (adjusted).

Correlation Coefficient (r)

  • Measures both direction (+/-) and strength (closer to –1 or 1 stronger, closer to 0 weaker).
  • Correlation does NOT imply causation!

Formulas

  • Slope: b = r * (Sy / Sx)
  • y - intercept: a = \overline{y} - b\overline{x}

Important Notes on Correlation

  • Correlation is a measure of association, not causation.
  • Correlation is only appropriate to use to describe the strength and direction for linear relationships.
  • Correlation does not measure form.
  • Correlation is not a resistant measure of strength (similar to mean and standard deviation).
  • Correlation has no unit of measurement and requires that both variables be quantitative; makes no distinction between explanatory and response variables.

Additional Considerations

  • Don’t make predictions using values of x that are much larger or much smaller than those that actually appear in your data (known as extrapolation).
  • When asked to interpret the slope or y intercept, include the word predicted in your response.
  • Slope is changes in y over changes in x. (Sy/Sx) can be found on the AP Statistics Formula Sheet.