2.5

Regression

  • Correlation and Regression
    • A firm grasp of the use and limitations of both correlation and regression is essential for effective data analysis.

Residuals

  • Definition of Residual
    • A residual is the difference between an observed value and the value predicted by the regression line.
    • It can be mathematically expressed as:
      residual = y - oldsymbol{ ext{predicted} ext{ } oldsymbol{ oldsymbol{ ilde{y}}} }
      where ( ilde{y} ) denotes the predicted value from the regression model.

Residual Plots

  • Definition of Residual Plot
    • A residual plot is a scatterplot of the regression residuals against the explanatory variable.
  • Purpose of Residual Plots
    • Residual plots help assess the fit of a regression line.
    • The mean of the least-squares residuals is always zero, so this line ought to appear on a residual plot.

Outliers and Influential Observations in Regression

  • Definition of Outlier
    • An outlier is defined as an observation that lies outside the overall pattern of the other observations.
  • Definition of Influential Observation
    • An observation is influential for a statistic if removing it would markedly change the result of the analysis.
    • Not all outliers are influential: Points can be outliers but not have a significant impact on the statistical results.
    • Points that are outliers in the x direction tend to be influential, although this is not always the case.

Lurking Variables

  • Definition of Lurking Variable
    • A lurking variable is a variable that is not among the explanatory or response variables in a study yet may influence the interpretation of relationships between those variables.

Association Does Not Imply Causation

  • Key Concept
    • An association between an explanatory variable ( x ) and a response variable ( y ), no matter how strong, does not by itself provide evidence that changes in ( x ) cause changes in ( y ).