Regression Analysis and Diagnostics Notes

Multiple Regression Analysis

Overview of Regression

  • Types of Regression: Simple and multiple regression used to predict a dependent variable (Y) based on multiple predictors.
  • Method of Least Squares: Used to minimize prediction errors in regression models.
  • Model Evaluation Metrics:
    • Goodness of Fit:
    • R: Correlation coefficient, measures strength and direction of a linear relationship.
    • : Proportion of variance in Y explained by the model.
    • Adjusted R²: Modified version of R² that adjusts for the number of predictors in the model.
    • Significance of Model:
    • F-statistic: Tests if the overall model is significant.
    • Individual Predictor Contribution (b values):
      • t-Test: Tests if predictors significantly differ from zero.
      • Relative Importance (β): Standardized coefficients that indicate the relative contribution of each predictor.
      • Unique Proportion of Variance (sr²): Unique variance attributable to a specific predictor after accounting for others.

Model Fit Statistics

  • R² = 0.209: This indicates that approximately 20.9% of the variability in exam performance is explained by the model factors (e.g., exam anxiety and time spent revising).
  • Adjusted R² = 0.193: Adjusted value reflecting model performance accounting for predictors.
  • Model significantly better than no model:
    • F(2, 100) = 13.2, p < 0.001 indicates statistical significance.

Regression Equation Example

  • The model equation can be expressed as: \text{ExamPerf} = 87.833 + 0.241(\text{TimeRev}) - 0.485(\text{Anxiety})
    • Interpretation:
      • For every additional hour spent revising, exam performance increases by 0.241, holding other variables constant.

Statistical Significance of Predictors

  • Standardized Regression Coefficients (Betas):
    • Time Rev (b = 0.241):
    • t-value: t(100) = 1.24, p = 0.18 (not statistically significant)
    • Exam Anxiety (b = -0.485):
    • t-value: t(100) = -2.544, p = 0.01 (statistically significant)

Semi-Partial Correlation**

  • Findings:
    • Zero-order correlation of revision and performance: r = 0.397
      • After adjusting for anxiety, this correlation drops to r = 0.119
    • Effect of Anxiety:
      • Zero-order correlation r = -0.709 with performance, reduced to r = -0.226 after controlling for revision.
    • Unique Variance from Anxiety:
      • (-0.226)^{2} = 0.05176 indicates that approximately 5% of the variance in exam performance is uniquely accounted for by anxiety.

Bias in Regression

  • Sources of Bias:
    • Violations of assumptions and outliers impact regression estimates significantly:
    • Assumptions:
      1. Linearity: The relationship between variables must be linear.
      2. Normality: Residuals must be normally distributed.
      3. Homoscedasticity: Variance of Y should be constant across levels of X.
      4. Independence: Observations must be independent.
      5. No Multicollinearity: Predictors should not be highly correlated.
    • Outliers: Extreme values can disproportionately influence results by skewing estimates and standard errors.

Identifying Outliers**

  • Diagnostics and Metrics:
    • Residuals: Differences between observed and predicted values. Use standardization to assess influence:
    • Standardized residuals should fall within -2 to 2 (95% of data).
    • Mahalanobis Distance: Helps identify outliers based on multivariate distance from the centroid.
    • Cook’s Distance: Assesses the impact of each observation on the overall regression results.

Assumptions Checking**

  • Independence: Check with the Durbin-Watson test for autocorrelation in residuals.
  • Normality of Residuals: Assess using histograms and Q-Q plots.
  • Linearity: Verify through scatter plots of standardized predicted vs. standardized residuals.
  • Homoscedasticity: Visualize spread in residuals against predicted values (look for randomness).
  • No Multicollinearity: Calculate variance inflation factor (VIF); VIF > 10 indicates a problem.

Model Evaluation**

  • Final Steps: After ensuring all assumptions are met, and outliers properly assessed, model validity can be confirmed, allowing for continued analysis.
    • Significant regression findings lead to conclusion acceptance and practical implications.