Notes on R-squared in Regression Analysis

Acknowledgment of Country

  • Respect for Noongar country and its elders, past, present, and emerging.

Understanding R-squared

  • Definition of R-squared: A statistical measure that explains how much variability in the response variable is accounted for by the explanatory variable(s).
    • Range: $[0, 1]$
    • R-squared = 1: Perfect explanation of variability by the model.
    • R-squared = 0: No explanation of variability.
  • Interpretation of Values: Needs context; a higher R-squared value does not always indicate a better model.
    • Context-specific analysis:
    • In manufacturing: Higher R-squared may indicate control over variability.
    • In human studies: Small, statistically significant associations may be crucial.

Limitations of R-squared

  • Does not have a universal benchmark for what constitutes an acceptable value.
  • Misleading if considered in isolation, especially in complex models.

Regression Line Determination

  • Best Fit Line: The line that minimizes the sum of squared differences between observed and predicted values.
    • Coefficients Involved: Intercept and slope, which define the regression line mathematically.
  • Minimization Process:
    • Squared differences emphasize larger residuals (errors) more than smaller ones.
    • Squaring residuals ensures all errors are treated positively and simplifies calculations.

Least Squares Method

  • The method used to determine the regression line by minimizing squared residuals.
  • Known as the Least Squares line of Best Fit.

Understanding Variance and Sums of Squares

  • Total Sum of Squares (TSS): Overall variability in the response variable.
    • TSS is computed as the squared differences from the null hypothesis (mean model).
  • Regression Sum of Squares (RSS): Variability explained by the regression model.
    • The difference in variability between the explanatory model and null hypothesis.
  • Residual Sum of Squares (ESS): Variation not explained by the model.
    • Reflects errors from the actual data to the predicted line.
  • Relationship between sums of squares:
    TSS=RSS+ESS\text{TSS} = \text{RSS} + \text{ESS}

Visual Representation

  • Total Sum of Squares (TSS): Visualized as the box representing the total variation from the mean line.
  • Explained and Unexplained Variance: The visual representation helps assess how much variance is attributed to the explanatory variable versus what remains unexplained.

Conclusion on R-squared

  • R-squared is crucial for understanding model performance but must be interpreted in context.
  • It is a component of quantitative analysis that guides decisions with data-driven insights.