Multiple Linear Regression: Key Concepts and Terminology
Introduction to Multiple Regression Analysis
- Multiple Regression Analysis (MRA) is used when multiple independent variables are considered to understand their relationship with a dependent variable.
- Useful in modeling situations where more than one factor influences outcomes.
Basic Concepts of Multiple Regression
- Adding independent variables helps explain unexplained variations in the dependent variable.
- MRA includes the following critical elements:
- Model errors (đťś–) should be statistically independent and normally distributed.
- The variance of errors should be the same across all levels of the independent variable x.
- The means of the dependent variable (y) should relate to x as a straight line (population regression model).
The Basic Model
- Mathematical representation:
y = eta0 + eta1 x1 + eta2 x2 + … + etak x_k + \epsilon
- Where
- eta_0 = Intercept
- eta1, eta2, …, eta_k are the coefficients for each independent variable.
Model Specification and Building
- Model Specification: Define dependent variable and select independent variables.
- Model Building: Construct the mathematical equation using independent variables to explain variation in the dependent variable.
Example: First City Real Estate
- The firm conducts a study with 319 complete data points.
- Dependent Variable: Sales Price
- Independent Variables:
- x_1 = Home size (sq. ft)
- x_2 = Age of house (years)
- x_3 = Number of bedrooms
- x_4 = Number of bathrooms
- x_5 = Garage size (number of cars)
- Estimated Regression Equation Formulated as:
y = \beta0 + \beta1 x1 + \beta2 x2 + \beta3 x3 + \beta4 x4 + \beta5 x_5 + \epsilon
Model Testing and Validation
- Model Diagnosis: Analyze quality of the regression model using R-squared, standard error, and checks for multicollinearity.
- R-squared (R²): Indicates proportion of variance explained by the model (e.g., R²=0.8161 means 81% variance in sales price explained).
- Standard Error: Measures dispersion around the predicted values, needed for assessing the model's accuracy.
Multicollinearity**
- Occurs when two independent variables are highly correlated, leading to redundancy in information.
- Symptoms include:
- Unexpected signs on coefficient estimates.
- Variability in coefficients when new variables are added or removed.
Including Qualitative Variables
- Qualitative variables can be included through Dummy Variables: assigned values of 0 or 1 based on presence of a characteristic.
- Dummy Variable Trap: Avoid using too many dummy variables to prevent perfect multicollinearity (e.g., only use one less than the total categories).
Nonlinear Relationships
- MRA usually assumes linear relationships, but many situations may require polynomial regression models to address curvilinear patterns.
- Higher order polynomial terms can be added to the regression model to model these relationships.
Model Aptness Assessment
- Check independence of model errors (đťś–).
- Confirm errors normally distributed and homoscedastic (constant variance).
- Assess residual patterns visually to verify linearity and independence.