2.5
Regression
- Correlation and Regression
- A firm grasp of the use and limitations of both correlation and regression is essential for effective data analysis.
Residuals
- Definition of Residual
- A residual is the difference between an observed value and the value predicted by the regression line.
- It can be mathematically expressed as:
residual = y - oldsymbol{ ext{predicted} ext{ } oldsymbol{ oldsymbol{ ilde{y}}} }
where ( ilde{y} ) denotes the predicted value from the regression model.
Residual Plots
- Definition of Residual Plot
- A residual plot is a scatterplot of the regression residuals against the explanatory variable.
- Purpose of Residual Plots
- Residual plots help assess the fit of a regression line.
- The mean of the least-squares residuals is always zero, so this line ought to appear on a residual plot.
Outliers and Influential Observations in Regression
- Definition of Outlier
- An outlier is defined as an observation that lies outside the overall pattern of the other observations.
- Definition of Influential Observation
- An observation is influential for a statistic if removing it would markedly change the result of the analysis.
- Not all outliers are influential: Points can be outliers but not have a significant impact on the statistical results.
- Points that are outliers in the x direction tend to be influential, although this is not always the case.
Lurking Variables
- Definition of Lurking Variable
- A lurking variable is a variable that is not among the explanatory or response variables in a study yet may influence the interpretation of relationships between those variables.
Association Does Not Imply Causation
- Key Concept
- An association between an explanatory variable ( x ) and a response variable ( y ), no matter how strong, does not by itself provide evidence that changes in ( x ) cause changes in ( y ).