Make predictions using regression lines, taking into account the dangers of extrapolation.
Calculate and interpret a residual.
Interpret the slope and y-intercept of a regression line.
Determine the equation of a least-squares regression line using technology or computer output.
Construct and interpret residual plots to assess the appropriateness of a regression model.
Interpret the standard deviation of the residuals and $r^2$, and utilize these values to evaluate how well a least-squares regression line models the relationship between two variables.
Describe how the least-squares regression line, standard deviation of the residuals, and $r^2$ are influenced by unusual points.
Calculate the slope and y-intercept of the least-squares regression line from the means and standard deviations of $x$ and $y$, along with their correlation.
Introduction to Regression
Regression lines, often termed simple linear regression models, involve only one explanatory variable.
They highlight linear (straight-line) relationships between two quantitative variables, observable in diverse settings like Major League Baseball statistics, geyser eruption times, and award distributions.
Definition of Regression Line
A regression line models the relationship between a response variable $y$ and an explanatory variable $x$.
The formula for a regression line is expressed as:
ildey=a+bx
where:
$ ilde{y}$: Predicted value of $y$ for a specific value of $x$
$a$: y-intercept
$b$: slope
Predicting Values
Example: Used Ford F-150 Pricing
A study of 16 used Ford F-150 SuperCrew 4x4 trucks explores the relationship between miles driven and price.
Data recorded includes miles driven and corresponding prices:
Extrapolation example with 300,000 miles driven results in:
Price=38,257−0.1629(300,000)=−10,613
This prediction is nonsensical; hence, it illustrates the danger of extrapolation, which refers to using the regression line for predictions beyond the range of observed data values.
Definition of Extrapolation:
Extrapolation is the use of a regression line for prediction outside the interval of $x$ values used to obtain the line, leading to less reliable predictions.
Calculating and Interpreting Residuals
Residuals are the prediction errors resulting from a regression line, defined as:
Residual=Actual y−Predicted y=y−y~
In the context of the Ford F-150 data, for an actual price $y$ of $21,994 and predicted price of $26,759:
Example Problem: Calculating residual for Andres, who grabbed 36 Starburst candies when predicted to grab 32.46:
Computed residual: 36−32.46=3.54
Interpretation: Andres grabbed 3.54 more candies than predicted.
Assessing Model Appropriateness with Residual Plots
Residual Plot Definition: A scatterplot of the residuals (errors) plotted against the explanatory variable.
Assessing linearity:
If residuals show a random pattern, a linear model may be appropriate.
If residuals display patterns (e.g., U-shaped), a non-linear model might be needed.
Standard Deviation of Residuals and Coefficient of Determination
The standard deviation of the residuals $s$ indicates the size of a typical residual (error) and helps evaluate model fit. It can be calculated as: s=n−2∑(Residuals)2
The coefficient of determination $r^2$ measures variance in the response variable explained by the model:
Defined as: r2=1−Sum of squared residuals from the meanSum of squared residuals from the regression line
Interpretation: $r^2$ indicates the percentage of variability explained by the regression model.
Final Thoughts on Regression Analysis
Understanding $s$ and $r^2$ has practical significance in validating the regression model's effectiveness in predicting values.
It's crucial to report both statistics alongside the regression output for comprehensive model assessment. \n
Example Problem - Using Summary Statistics to Calculate Regression Line
To derive the least-squares regression equation mathematically, use: b=rs</em>xs<em>y, with a=yˉ−bxˉ
For example, using data from a sample of students:
Mean foot length: $\bar{x} = 24.76$ cm, height: $\bar{y} = 171.43$ cm, and correlation $r = 0.697$.
Calculation results in:
Slope $b = 2.75$ giving an equation: y^=103.34+2.75x