Study Notes on the Least-Squares Regression Line
Section 11.2: The Least-Squares Regression Line
OBJECTIVES
Objective 1: Compute the least-squares regression line.
Objective 2: Compute the correlation coefficient.
Objective 3: Use the least-squares regression line to make predictions.
Objective 4: Interpret predicted values, the slope, and the y-intercept of the least-squares regression line.
OBJECTIVE 1: COMPUTE THE LEAST-SQUARES REGRESSION LINE
A sample of house data is provided: size in square feet and selling price in thousands of dollars.
Previous analysis revealed a strong positive linear association between the size and sales price of houses.
Data Table:
Size (Square Feet) and Selling Price ($1000s):
2521 sq ft: 400
2555 sq ft: 426
2735 sq ft: 428
2846 sq ft: 435
3028 sq ft: 469
3049 sq ft: 475
3198 sq ft: 488
3198 sq ft: 455
LEAST-SQUARES REGRESSION LINE
The least-squares regression line minimizes the sum of the squared vertical distances between the data points and the line itself.
Best-Fitting Line:
The optimal fitting line is the one for which these squared vertical distances are smallest.
Equation of the Least-Squares Regression Line:
The general form for predicting y from x is given as:
where:
= y-intercept
= slope.
Variable Definitions:
The variable we want to predict (selling price) is known as the outcome variable or dependent variable.
The variable we are given (size) is known as the explanatory variable or independent variable.
CALCULATOR USAGE TO COMPUTE REGRESSION LINE
A one-time setting on the calculator is required to display the correlation coefficient.
TI-84 Plus Calculator Steps:
To configure: Press
2nd,0, and select DiagnosticOn.After setting, compute the least-squares regression line and the correlation coefficient for the house data.
DATA SUMMARY
House Size:
2521 sq ft: 400
2555 sq ft: 426
2735 sq ft: 428
2846 sq ft: 435
3028 sq ft: 469
3049 sq ft: 475
3198 sq ft: 488
3198 sq ft: 455
OBJECTIVE 2: USE THE LEAST-SQUARES REGRESSION LINE TO MAKE PREDICTIONS
Predicted Value: Predictions can be made by substituting the explanatory variable's value into the regression equation.
Example Calculation:
Given the equation for selling price based on size:
Predicting selling price for house size of 2800 sq ft:
Calculation: .
Point of Averages:
Average size of houses: sq ft.
Average selling price: thousand dollars.
Substituting average size back into the regression gives expected average selling price, confirming the linear relationship.
OBJECTIVE 3: INTERPRET PREDICTED VALUES, THE SLOPE, AND THE y-INTERCEPT
Predicted Values: Estimates of the average outcome for given values of the explanatory variable.
Example: With the equation , to estimate average price for 3000 sq ft:
Substitute: .
Interpreting the y-Intercept (a):
The y-intercept is where the line crosses the y-axis:
Interpreted only if data includes both positive and negative x-values.
If the range of x-values contains only positive or negative numbers, the intercept value does not hold significant practical interpretation.
INTERPRETING THE SLOPE (b)
The slope represents how much the predicted value (outcome variable) changes with a unit change in x (explanatory variable):
If the values of the explanatory variable differ by 1, the predicted values change by the amount of the slope, .
If the change in the explanatory variable is by a factor , the predicted values change by the amount .
Example Comparison:
Considering two houses: One at 1900 sq ft and another at 1750 sq ft, predict their price difference.
CHECK YOUR UNDERSTANDING - STUDENT PERFORMANCE EXAMPLE
At final exams, students reported hours studied, leading to the regression line for predicting scores:
.
Predict Antoine's score studying for 6 hours:
Substitute: .
Effect of studying more hours: If Emma studied 3 hours longer than Jeremy, predict the score difference based on the slope of 5.
KEY CONCEPTS TO REMEMBER
Definitions of outcome (response) and explanatory (predictor) variables.
Calculation of least-squares regression line.
Application of regression line for predictions.
Interpretation of predicted values, y-intercepts, and slopes in regression analysis.