JZ

Regression Part 4

Introduction to Regression
  • Previous Discussion: The last session introduced the regression model, focusing on predicting the dependent variable (y) from one independent variable (x).

  • Current Focus: The lecture will extend the discussion to multiple regression, which involves predicting one dependent variable (y) from two or more independent variables (predictors).

Multiple Regression Overview
1. Type of Data
  • Dependent Variable: One quantitative dependent variable (y).

  • Predictors: Two or more independent variables, which can be either quantitative or dichotomous.

  • Note: For a dichotomous dependent variable, utilize LOGIT regression to accommodate multiple predictors.

2. Purpose and Use
  • Prediction: To predict the value of the dependent variable (y).

  • Understanding Relationships: To comprehend the relationships between the predictors (x's) and the dependent variable (y).

3. Regression Equation
  • The equation for multiple regression is expressed as:

    y = b0 + b1x1 + b2x2 + … + bkx_k

  • Note: The equations for the slopes are complex and not displayed for simplicity.

4. Example Application
  • Example: Predicting the market value of a home based on factors such as size, age, number of bedrooms, etc.

5. Important Consideration: Multicollinearity
  • Definition: Multicollinearity refers to the situation where independent variables (x's) are correlated with each other.

  • Implications: it affects the interpretation of results and complicates analysis.- Underlying principle: Similar to simple regression but with multiple predictors.

    • Correlation among predictors can complicate the interpretation of coefficient estimates.

Application Example: Nexus Connections Case Output
Regression Output Details
  • Variables Used: All potential predictors (gender, minority status, marital status, age, tenure, rating) were included to predict salary (y).

  • R-Square Value: R² = 72.3% - indicates the variance in salary explained by the predictors.

  • Sample Size: n = 140 observations.

  • 10:1 Rule: With 6 predictors, the ratio is 23:1, which is deemed sufficient.

  • Adjusted R-Square: Little variation from simple regression.

ANOVA Table and Hypothesis Testing
  • Purpose of ANOVA Test: To determine if the whole model predicts a significant amount of variance in y.

    • Null Hypothesis (H₀): R² = 0 (the model does not explain variance).

    • Alternative Hypothesis (H₁): R² > 0 (the model does explain variance).

  • P-Value Assessment: In this case, the significance F p-value = 9.8 x 10⁻³⁵, substantially lower than common alpha levels (0.05, 0.01, or 0.005).

    • Conclusion: Reject H₀; the predictors significantly explain variance in salary.

Gathering Coefficients from the Regression Table
  • A list of coefficients (b's) allows further interpretation of predictor impacts.

  • Each coefficient can be tested for statistical significance using a t-test: t = \frac{b*i}{S{bi}} - Hypothesis:

    • H₀: βᵢ = 0 (coefficient has no effect)

    • H₁: βᵢ ≠ 0 (coefficient has an effect)

    • Compare p-value against alpha (α) to decide on coefficients' significance.

Regression Equation Example
Equation
  • \tilde{y} = 548 + 30.9G + 44.9M + 8.6Ma - 0.06A + 62.4T + 129R

  • Significance Testing:- Each predictor's coefficient must be examined:

    • If p-value < α, reject H₀ (coefficient is significantly predicting y).

    • If p-value ≥ α, retain H₀ (coefficient is not significant).

Interpretation of Multiple Regression Results
Y-Intercept
  • Meaning: The y-intercept (b₀) usually holds no real significance unless all predictor variables are zero, which is seldom meaningful in practice.

Individual Coefficients
  • Gross Impact vs Net Impact:- Gross Impact: Captured in simple regression; refers to the impact of individual predictors without accounting for others.

    • Net Impact: Reflects the effect of a predictor while controlling for the impact of other variables, generally weaker than gross impact.

Understanding Multicollinearity
What is Multicollinearity?
  • Multicollinearity indicates overlapping information among predictors, leading to complications in interpretation.- Can be visualized using a Venn diagram, demonstrating the shared variance.

Interpretation of Coefficients in Practice
  • The coefficients obtained in multiple regression reflect the impact of each predictor while controlling for the variance accounted for by others.

  • A hypothetical situation illustrated: Comparing two candidates with identical backgrounds except for minority status demonstrates how coefficients reflect net differences due to controlled variables.

  • Conclusion: Non-significance does not equate to lack of relationship; it reflects predictive power when considering other variables.

Techniques to Identify Multicollinearity
Through Correlation Matrix
  • A correlation matrix helps identify multicollinearity by revealing correlations among predictors.

  • Significant Correlations: Any absolute correlation > 0.166 (two-tailed) is significant at n=140.

Comparing Coefficients
  • Coefficients change significantly from simple to multiple regression analyses, indicating gross vs net impacts from variables.

  • A summary table contrasts simple regression coefficients (showing isolated effects) with multiple regression coefficients (showing joint effects).

  • A sizeable drop or reversal in coefficients points to the influence of multicollinearity.

Investigating R² Values
  • Multiple Regression R²: 0.723

  • Sum from Simple Regressions: Value exceeds theoretical maximum of 1.0, indicating overlaps in variance counted multiple times across simple regressions. This supports the claim of multicollinearity adjustments in multiple regression.

Techniques to Address Multicollinearity
  • Remove Redundant Predictors: If two or more predictors are highly correlated, consider removing one of them, especially if one is conceptually more important or easier to measure. This reduces redundancy and improves coefficient interpretability.

  • Combine Predictors: Create a composite variable or an index from multiple highly correlated predictors if they represent a similar underlying construct.

  • Increase Sample Size: While not always practical, a larger sample size can sometimes mitigate the impact of multicollinearity by providing more stable coefficient estimates.

  • Use Advanced Techniques: For more severe cases, techniques like Principal Component Analysis (PCA) or regularized regression (e.g., Ridge Regression, Lasso Regression) can be employed, though these are typically beyond basic multiple regression.

Conclusion
  • Multiple regression is a powerful tool for predicting a dependent variable from multiple predictors and understanding their net impacts.

  • Vigilance regarding multicollinearity is crucial for accurate interpretation and reliable model building. Addressing multicollinearity ensures that the individual coefficients provide meaningful insights into the unique contribution of each predictor.