Notes on Least-squares Regression
3.2: Least-squares Regression
Definition:
A method for determining a line that summarizes the relationship between two variables.
This method applies specifically when one variable (explanatory variable) helps explain or predict the other variable (response variable).
The methodology provides an exact procedure/formula, distinguishing it from subjective assessments such as drawing lines by eye on a scatterplot.
Regression Line
Definition:
A straight line that models the relationship between the response variable, denoted as y, and the explanatory variable, denoted as x.
Use:
The regression line is employed to predict the value of y based on a specific value of x.
Unlike correlation coefficients that reveal the strength of a relationship, regression requires a clear distinction between explanatory and response variables.
Example:
For instance, if predicting price based on miles driven, the regression line can estimate average prices for cars based on how far they have been driven.
Regression Equation
Formula:
The regression relationship can be expressed as:
Where:
$y$ : The observed value (actual value of the response variable).
$ar{y}$ : The predicted value of the response variable for a given value of the explanatory variable $x$ (often read as "y-hat").
$b$ : Represents the slope of the regression line, indicating how much $y$ is expected to change with each 1-unit increase in $x$.
$a$ : The y-intercept, or the predicted value of $y$ when $x = 0$.
Interpretation of the Regression Equation
Slope Interpretation:
For every one unit increase in the explanatory variable (contextual interpretation), the response variable (contextual interpretation) is predicted to increase or decrease by the value of the slope (with appropriate units).
Y-Intercept Interpretation:
When the explanatory variable is 0 (contextual interpretation), the response variable is expected to equal the value of the y-intercept (with appropriate units).
Data Prediction Techniques
Interpolation:
The practice of predicting a value within the range of the data provided (i.e., within the parameter of the dataset).
Note: Safe to conduct as the prediction is based on existing data.
Extrapolation:
The practice of predicting a value outside the range of given data (i.e., beyond the parameter of the dataset).
Caution: Not safe to extrapolate because predictions can become unreliable due to lack of data support.
Additional Considerations
The y-intercept may not always be statistically meaningful and can often lead to extrapolation concerns.
Care should be taken to avoid predicting for x-values that are significantly greater or less than the values present in the original dataset.