Regression Analysis and Diagnostics Notes
Multiple Regression Analysis
Overview of Regression
- Types of Regression: Simple and multiple regression used to predict a dependent variable (Y) based on multiple predictors.
- Method of Least Squares: Used to minimize prediction errors in regression models.
- Model Evaluation Metrics:
- Goodness of Fit:
- R: Correlation coefficient, measures strength and direction of a linear relationship.
- R²: Proportion of variance in Y explained by the model.
- Adjusted R²: Modified version of R² that adjusts for the number of predictors in the model.
- Significance of Model:
- F-statistic: Tests if the overall model is significant.
- Individual Predictor Contribution (b values):
- t-Test: Tests if predictors significantly differ from zero.
- Relative Importance (β): Standardized coefficients that indicate the relative contribution of each predictor.
- Unique Proportion of Variance (sr²): Unique variance attributable to a specific predictor after accounting for others.
Model Fit Statistics
- R² = 0.209: This indicates that approximately 20.9% of the variability in exam performance is explained by the model factors (e.g., exam anxiety and time spent revising).
- Adjusted R² = 0.193: Adjusted value reflecting model performance accounting for predictors.
- Model significantly better than no model:
- F(2, 100) = 13.2, p < 0.001 indicates statistical significance.
Regression Equation Example
- The model equation can be expressed as:
\text{ExamPerf} = 87.833 + 0.241(\text{TimeRev}) - 0.485(\text{Anxiety})
- Interpretation:
- For every additional hour spent revising, exam performance increases by 0.241, holding other variables constant.
Statistical Significance of Predictors
- Standardized Regression Coefficients (Betas):
- Time Rev (b = 0.241):
- t-value: t(100) = 1.24, p = 0.18 (not statistically significant)
- Exam Anxiety (b = -0.485):
- t-value: t(100) = -2.544, p = 0.01 (statistically significant)
Semi-Partial Correlation**
- Findings:
- Zero-order correlation of revision and performance: r = 0.397
- After adjusting for anxiety, this correlation drops to r = 0.119
- Effect of Anxiety:
- Zero-order correlation r = -0.709 with performance, reduced to r = -0.226 after controlling for revision.
- Unique Variance from Anxiety:
- (-0.226)^{2} = 0.05176 indicates that approximately 5% of the variance in exam performance is uniquely accounted for by anxiety.
Bias in Regression
- Sources of Bias:
- Violations of assumptions and outliers impact regression estimates significantly:
- Assumptions:
- Linearity: The relationship between variables must be linear.
- Normality: Residuals must be normally distributed.
- Homoscedasticity: Variance of Y should be constant across levels of X.
- Independence: Observations must be independent.
- No Multicollinearity: Predictors should not be highly correlated.
- Outliers: Extreme values can disproportionately influence results by skewing estimates and standard errors.
Identifying Outliers**
- Diagnostics and Metrics:
- Residuals: Differences between observed and predicted values. Use standardization to assess influence:
- Standardized residuals should fall within -2 to 2 (95% of data).
- Mahalanobis Distance: Helps identify outliers based on multivariate distance from the centroid.
- Cook’s Distance: Assesses the impact of each observation on the overall regression results.
Assumptions Checking**
- Independence: Check with the Durbin-Watson test for autocorrelation in residuals.
- Normality of Residuals: Assess using histograms and Q-Q plots.
- Linearity: Verify through scatter plots of standardized predicted vs. standardized residuals.
- Homoscedasticity: Visualize spread in residuals against predicted values (look for randomness).
- No Multicollinearity: Calculate variance inflation factor (VIF); VIF > 10 indicates a problem.
Model Evaluation**
- Final Steps: After ensuring all assumptions are met, and outliers properly assessed, model validity can be confirmed, allowing for continued analysis.
- Significant regression findings lead to conclusion acceptance and practical implications.