Model Selection
Model Selection Purposes
Choosing explanatory variables based on the model's goals:
Prediction for new observations.
Describing relationships between variables.
Variable Selection Methods
Stepwise Regression: Add/remove variables one by one.
Best Subset Regression: Identify a subset of predictor variables.
Shrinkage: Fit a model with all predictors, deeming unimportant coefficients.
Example Analysis
Factors affecting life expectancy include:
Number of deaths between 15-60 years (per 1000).
Infant deaths (per 1000).
Per capita alcohol consumption.
Health expenditure (% of GDP per capita).
Average BMI.
Data collected from WHO and UN (2014), missing data accounted.
Model Comparisons
Simple Model:
$model.simple <- lm(
Life expectancy~ GDP)$
Extended Model:
Includes all variables from the dataset.
Comparison of $R^2$ values for model fit.
Stepwise Selection Process
Forward Selection: Begin with single variable models, progressively add.
Backward Selection: Start with all variables, remove the least contributing one.
Hybrid Approach: Combines forward and backward methods.
Variable Selection Criteria
Adjusted R²: Considered for linear models.
Aikaike's Information Criterion (AIC): Lower values preferred.
Bayesian Information Criterion (BIC): Lower values preferred.
Step() Function in R
Used for model selection through both forward and backward directions.
Problems with Single-Direction Selection
Fixed positions after add/remove lead to suboptimal models.
Increased collinearity issues.
Automated hypothesis testing increases Type I errors.
Best Subsets Selection
Finds best models for any subsets of up to 8 variables.
Often uses BIC as the selection criterion.
LASSO Overview
Regularizes coefficient estimates using tuning parameters.
Evaluating Final Model
Check if final variables make sense and meet necessary assumptions.
Assess for multicollinearity using Variance Inflation Factor (VIF).
Final Model Interpretation
Adult Mortality, HIV/AIDS negatively impact life expectancy.
Alcohol, Total Expenditure, Schooling positively influence life expectancy.