Concept: Extends simple linear regression to include multiple predictors.
Objective: Predict the outcome variable (e.g., number of active positions) using multiple inputs.
Key Points
Predictors: Not limited to a fixed set; can include various factors influencing the response variable.
Statistical Inference: Used to identify significant predictors among several variables.
Example of Predictors
Total population
Land area
Total personal income
Percentage of the population over 65 years old
Multiple Regression Defined
Overview: Involves more than one predictor variable, facilitating a comprehensive model to explore influences on outcomes.
Application: Commonly used in housing data to predict sales prices.
Housing Data Considerations:
Size of the house
Neighborhood crime rates
Number of rooms
House age & last remodeled time
Variables in Predicting House Prices
Binary Variables: E.g., whether a house has a backyard (coded as 1 or 0).
Numerical Variables: E.g., age of the house, number of garage spaces.
Correlation Between Variables
Collinearity: Occurs when two predictors are highly correlated.
Impact on Model: Including redundant predictors can distort model relevance and interpretation.
Building the Regression Model
Model Structure: Formula = Intercept + Predictor1 + Predictor2 + … + PredictorN
Visualization:
With multiple predictors, visualize relationships as hyperplanes instead of straight lines.
Case Study: Predicting Birth Weight
Example Predictors:
Weeks of pregnancy
Mother's age
Mother's weight gain
Smoking habits
Model Interpretation
Intercept Understanding: Set all predictors to zero to find baseline (may not have meaningful interpretation).
Coefficient Interpretation: Must consider other predictors when discussing significant changes.
Example: A positive coefficient for 'weeks' indicates an increase in weight per each week of gestation, while also considering the influence of other factors.
R-Squared Analysis
Definition: Measures how much variability in the outcome is explained by the model.
Change in R-Squared: Adding or removing predictors affects the percentage of variability explained.
Conclusion
Model Diagnostics: Always analyze goodness-of-fit and correlations among predictors.
Importance of Variable Selection: Only include variables that contribute meaningful information to the model to avoid complexity without added benefit.