Linear Regression: Interpretation and Extrapolation
Regression Line Fundamentals
The least squares regression line provides the best linear fit for observed data, yielding an output of slope () and y-intercept ().
The general algebraic form is , where:
represents the slope of the line.
represents the y-intercept.
All regression lines inherently pass through the point (the mean of the x-values and the mean of the y-values). This point represents the best central estimate from the data.
Rounding: Textbooks often use four decimal places. Always adhere to specific rounding instructions given for homework or exams.
Interpreting the Slope ()
Definition: The slope is defined as "rise over run," representing the vertical change in the y-variable for every horizontal change in the x-variable. It shows the rate of change of the response variable with respect to the explanatory variable.
Contextual Interpretation - Key Phraseology: For every one-unit increase in the explanatory variable (), we expect the response variable () to change by the value of the slope.
Cautionary Language:
It is crucial to use language that reflects uncertainty in statistical models, such as "we expect to happen."
Avoid definitive statements like "this will happen," as statistical models are based on samples and inherent uncertainty, not entire populations.
Algebraic Analogy: In the equation , if increases from to , increases from to (an increase of , which is the slope). Each unit increase in leads to a unit increase in .
Example (Burger): If
x(fat in grams) increases by one gram, we expecty(calories) to increase by the value of the slope.General Interpretation Template: "For every extra one [unit of x-variable], the [y-variable] is expected to [increase/decrease] by ."
Use "increase" if the slope is positive (a > 0).
Use "decrease" if the slope is negative (a < 0), but use the absolute value of the slope (do not include the negative sign with the word "decrease").
Interpreting the Y-intercept ()
Definition: Algebraically, the y-intercept is the value of when .
Contextual Interpretation: If the explanatory variable () is zero, we expect the response variable () to be the value of the y-intercept.
Conditions for Appropriate Interpretation: It is not always appropriate to interpret the y-intercept in context. Always consider these two questions:
Is zero a reasonable value for the explanatory variable () in the real world?
Example: If
xis height, a person cannot be inches tall. Thus, it's inappropriate to interpret.If not reasonable, the interpretation should state: "It is inappropriate to interpret the y-intercept" or "It does not make sense to interpret the y-intercept."
Do we have any observed data points near ?
Example: If
xis temperature and all collected data is from summer months (), even if is a reasonable temperature, we have no data near .Reasoning: The relationship between
xandymay not remain consistent at extreme values outside the observed data range. Extrapolating to in such cases can lead to untrustworthy predictions.
If either of these conditions leads to a "no," then it is inappropriate to interpret the y-intercept. No need for both to be problematic.
Self-Correction: If you accidentally interpret the y-intercept when
x=0is unreasonable, the resulting statement often sounds illogical (e.g., negative weight for zero height), prompting self-correction. However, for the second condition (no data near ), the interpretation might sound logical but still be untrustworthy.
Extrapolation: Using the Model Beyond Its Bounds
Definition: Extrapolation occurs when using a regression model to make predictions for values of the explanatory variable () that are significantly larger or smaller than the range of the observed data used to create the model.
Problem: We lack certainty that the established linear relationship (or any relationship) between
xandywill continue to hold true outside the observed data range. The behavior might change.Trustworthiness: While a linear equation can mathematically provide a
yvalue for anyx, the trustworthiness of that prediction decreases as the value moves further away from the mean of the collected data.Examples of Volatility/Non-linearity at Extremes:
Oil Prices: Highly volatile, making long-term predictions unreliable.
Weather Patterns: Relationships (e.g., temperature and rainfall) differ significantly between seasons (e.g., summer rain vs. winter snow/sleet).
Public Opinion: Can shift rapidly and unexpectedly.
Example: Height and Weight Regression Analysis
Scenario: A sample of people with recorded height (inches) and weight (pounds).
Explanatory variable (): Height
Response variable (): Weight
Steps for Finding the Least Squares Regression Line (Calculator Usage):
Enter height data into List 1 (L1) and weight data into List 2 (L2) of the calculator. Ensure lists are of equal length and corresponding values are aligned.
Use the
LinReg(ax+b)function (e.g., STAT > CALC > 4:LinReg(ax+b)), specifying L1 as the Xlist and L2 as the Ylist.
Example Calculator Output (Illustrative Values):
Slope (): (rounded to four decimal places)
Y-intercept (): (rounded to four decimal places)
Note: Pay close attention to negative signs for and .
Constructing the Regression Equation:
The predicted weight () equation:
Alternatively, using generic variables: (Remember the and the variable).
Interpreting the Slope for Height and Weight:
Slope:
Interpretation: "For every extra one inch taller a person is, we expect their weight to increase by pounds." (Since the slope is positive, we use "increase").
Interpreting the Y-intercept for Height and Weight:
Y-intercept:
Condition 1 Check (Reasonable value for ): Can a person be inches tall? No, this is not a reasonable value.
Condition 2 Check (Data near ): The observed height data ranges from inches to inches. There are no observations near .
Conclusion: It is inappropriate to interpret the y-intercept. A statement like "If a person is inches tall, we expect their weight to be pounds" is illogical (negative weight).