Notes on R-squared in Regression Analysis
Acknowledgment of Country
- Respect for Noongar country and its elders, past, present, and emerging.
Understanding R-squared
- Definition of R-squared: A statistical measure that explains how much variability in the response variable is accounted for by the explanatory variable(s).
- Range: $[0, 1]$
- R-squared = 1: Perfect explanation of variability by the model.
- R-squared = 0: No explanation of variability.
- Interpretation of Values: Needs context; a higher R-squared value does not always indicate a better model.
- Context-specific analysis:
- In manufacturing: Higher R-squared may indicate control over variability.
- In human studies: Small, statistically significant associations may be crucial.
Limitations of R-squared
- Does not have a universal benchmark for what constitutes an acceptable value.
- Misleading if considered in isolation, especially in complex models.
Regression Line Determination
- Best Fit Line: The line that minimizes the sum of squared differences between observed and predicted values.
- Coefficients Involved: Intercept and slope, which define the regression line mathematically.
- Minimization Process:
- Squared differences emphasize larger residuals (errors) more than smaller ones.
- Squaring residuals ensures all errors are treated positively and simplifies calculations.
Least Squares Method
- The method used to determine the regression line by minimizing squared residuals.
- Known as the Least Squares line of Best Fit.
Understanding Variance and Sums of Squares
- Total Sum of Squares (TSS): Overall variability in the response variable.
- TSS is computed as the squared differences from the null hypothesis (mean model).
- Regression Sum of Squares (RSS): Variability explained by the regression model.
- The difference in variability between the explanatory model and null hypothesis.
- Residual Sum of Squares (ESS): Variation not explained by the model.
- Reflects errors from the actual data to the predicted line.
- Relationship between sums of squares:
TSS=RSS+ESS
Visual Representation
- Total Sum of Squares (TSS): Visualized as the box representing the total variation from the mean line.
- Explained and Unexplained Variance: The visual representation helps assess how much variance is attributed to the explanatory variable versus what remains unexplained.
Conclusion on R-squared
- R-squared is crucial for understanding model performance but must be interpreted in context.
- It is a component of quantitative analysis that guides decisions with data-driven insights.