Introduction to Multiple Linear Regression

  • Concept: Extends simple linear regression to include multiple predictors.
  • Objective: Predict the outcome variable (e.g., number of active positions) using multiple inputs.

Key Points

  • Predictors: Not limited to a fixed set; can include various factors influencing the response variable.
  • Statistical Inference: Used to identify significant predictors among several variables.

Example of Predictors

  • Total population
  • Land area
  • Total personal income
  • Percentage of the population over 65 years old

Multiple Regression Defined

  • Overview: Involves more than one predictor variable, facilitating a comprehensive model to explore influences on outcomes.
  • Application: Commonly used in housing data to predict sales prices.
  • Housing Data Considerations:
    • Size of the house
    • Neighborhood crime rates
    • Number of rooms
    • House age & last remodeled time

Variables in Predicting House Prices

  • Binary Variables: E.g., whether a house has a backyard (coded as 1 or 0).
  • Numerical Variables: E.g., age of the house, number of garage spaces.

Correlation Between Variables

  • Collinearity: Occurs when two predictors are highly correlated.
  • Impact on Model: Including redundant predictors can distort model relevance and interpretation.

Building the Regression Model

  • Model Structure: Formula = Intercept + Predictor1 + Predictor2 + … + PredictorN
  • Visualization:
    • With multiple predictors, visualize relationships as hyperplanes instead of straight lines.

Case Study: Predicting Birth Weight

  • Example Predictors:
    • Weeks of pregnancy
    • Mother's age
    • Mother's weight gain
    • Smoking habits

Model Interpretation

  • Intercept Understanding: Set all predictors to zero to find baseline (may not have meaningful interpretation).
  • Coefficient Interpretation: Must consider other predictors when discussing significant changes.
    • Example: A positive coefficient for 'weeks' indicates an increase in weight per each week of gestation, while also considering the influence of other factors.

R-Squared Analysis

  • Definition: Measures how much variability in the outcome is explained by the model.
  • Change in R-Squared: Adding or removing predictors affects the percentage of variability explained.

Conclusion

  • Model Diagnostics: Always analyze goodness-of-fit and correlations among predictors.
  • Importance of Variable Selection: Only include variables that contribute meaningful information to the model to avoid complexity without added benefit.