Introduction to Multiple Linear Regression
- Concept: Extends simple linear regression to include multiple predictors.
- Objective: Predict the outcome variable (e.g., number of active positions) using multiple inputs.
Key Points
- Predictors: Not limited to a fixed set; can include various factors influencing the response variable.
- Statistical Inference: Used to identify significant predictors among several variables.
Example of Predictors
- Total population
- Land area
- Total personal income
- Percentage of the population over 65 years old
Multiple Regression Defined
- Overview: Involves more than one predictor variable, facilitating a comprehensive model to explore influences on outcomes.
- Application: Commonly used in housing data to predict sales prices.
- Housing Data Considerations:
- Size of the house
- Neighborhood crime rates
- Number of rooms
- House age & last remodeled time
Variables in Predicting House Prices
- Binary Variables: E.g., whether a house has a backyard (coded as 1 or 0).
- Numerical Variables: E.g., age of the house, number of garage spaces.
Correlation Between Variables
- Collinearity: Occurs when two predictors are highly correlated.
- Impact on Model: Including redundant predictors can distort model relevance and interpretation.
Building the Regression Model
- Model Structure: Formula = Intercept + Predictor1 + Predictor2 + … + PredictorN
- Visualization:
- With multiple predictors, visualize relationships as hyperplanes instead of straight lines.
Case Study: Predicting Birth Weight
- Example Predictors:
- Weeks of pregnancy
- Mother's age
- Mother's weight gain
- Smoking habits
Model Interpretation
- Intercept Understanding: Set all predictors to zero to find baseline (may not have meaningful interpretation).
- Coefficient Interpretation: Must consider other predictors when discussing significant changes.
- Example: A positive coefficient for 'weeks' indicates an increase in weight per each week of gestation, while also considering the influence of other factors.
R-Squared Analysis
- Definition: Measures how much variability in the outcome is explained by the model.
- Change in R-Squared: Adding or removing predictors affects the percentage of variability explained.
Conclusion
- Model Diagnostics: Always analyze goodness-of-fit and correlations among predictors.
- Importance of Variable Selection: Only include variables that contribute meaningful information to the model to avoid complexity without added benefit.