HG

Multiple Linear Regression: Key Concepts and Terminology

Introduction to Multiple Regression Analysis

  • Multiple Regression Analysis (MRA) is used when multiple independent variables are considered to understand their relationship with a dependent variable.
  • Useful in modeling situations where more than one factor influences outcomes.

Basic Concepts of Multiple Regression

  • Adding independent variables helps explain unexplained variations in the dependent variable.
  • MRA includes the following critical elements:
    • Model errors (𝜖) should be statistically independent and normally distributed.
    • The variance of errors should be the same across all levels of the independent variable x.
    • The means of the dependent variable (y) should relate to x as a straight line (population regression model).

The Basic Model

  • Mathematical representation: y = eta0 + eta1 x1 + eta2 x2 + … + etak x_k + \epsilon
    • Where
    • eta_0 = Intercept
    • eta1, eta2, …, eta_k are the coefficients for each independent variable.

Model Specification and Building

  1. Model Specification: Define dependent variable and select independent variables.
  2. Model Building: Construct the mathematical equation using independent variables to explain variation in the dependent variable.

Example: First City Real Estate

  • The firm conducts a study with 319 complete data points.
    • Dependent Variable: Sales Price
    • Independent Variables:
    • x_1 = Home size (sq. ft)
    • x_2 = Age of house (years)
    • x_3 = Number of bedrooms
    • x_4 = Number of bathrooms
    • x_5 = Garage size (number of cars)
  • Estimated Regression Equation Formulated as:
    y = \beta0 + \beta1 x1 + \beta2 x2 + \beta3 x3 + \beta4 x4 + \beta5 x_5 + \epsilon

Model Testing and Validation

  • Model Diagnosis: Analyze quality of the regression model using R-squared, standard error, and checks for multicollinearity.
    • R-squared (R²): Indicates proportion of variance explained by the model (e.g., R²=0.8161 means 81% variance in sales price explained).
    • Standard Error: Measures dispersion around the predicted values, needed for assessing the model's accuracy.

Multicollinearity**

  • Occurs when two independent variables are highly correlated, leading to redundancy in information.
  • Symptoms include:
    • Unexpected signs on coefficient estimates.
    • Variability in coefficients when new variables are added or removed.

Including Qualitative Variables

  • Qualitative variables can be included through Dummy Variables: assigned values of 0 or 1 based on presence of a characteristic.
  • Dummy Variable Trap: Avoid using too many dummy variables to prevent perfect multicollinearity (e.g., only use one less than the total categories).

Nonlinear Relationships

  • MRA usually assumes linear relationships, but many situations may require polynomial regression models to address curvilinear patterns.
    • Higher order polynomial terms can be added to the regression model to model these relationships.

Model Aptness Assessment

  1. Check independence of model errors (𝜖).
  2. Confirm errors normally distributed and homoscedastic (constant variance).
  3. Assess residual patterns visually to verify linearity and independence.