Chapter 6: Model Selection

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/12

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

13 Terms

1
New cards
Subset Selection
[Model Selection] This involves identifying a subset of the predictors believed to be related to the response and then fitting a model using least squares on this reduced set of variables.
2
New cards
Shrinkage (Regularization)
[Model Selection] This approach fits a model using all predictors but shrinks the coefficient estimates towards zero relative to the least squares estimates. This reduces variance and can also perform variable selection.
3
New cards
Dimension Reduction
[Model Selection] This projects the p predictors into an M-dimensional subspace, where M < p, by computing M linear combinations or projections of the variables, which are then used as predictors in a linear regression model fitted by least squares.
4
New cards
Best Subset Selection
[Subset Selection] This method fits all possible models containing subsets of the predictors. It starts with a null model (M0) and then, for each k from 1 to p, fits all (p choose k) models with exactly k predictors and selects the best model (M_k) based on the smallest Residual Sum of Squares (RSS) or largest R2.
5
New cards
Stepwise Selection
[Subset Selection] These methods offer a computationally efficient alternative to best subset selection, especially when p is large.
6
New cards
Forward Stepwise Selection
[Subset Selection] Starts with the null model and adds predictors one at a time, choosing the variable that gives the greatest improvement to the fit at each step. It is not guaranteed to find the best possible model.
7
New cards
Backward Stepwise Selection
[Subset Selection] Starts with the full model and iteratively removes the least useful predictor one at a time. It requires n > p.
8
New cards
Choosing the Optimal Model
[Subset Selection] The model with the smallest RSS and the largest R2 will always contain all of the predictors. However, these metrics are based on training error, which is a poor estimate of test error. Test error can be estimated directly using validation and cross-validation or indirectly by adjusting the training error using metrics like Cp, AIC, BIC, and adjusted R2.
9
New cards
Ridge Regression
[Shrinkage] This method minimizes RSS plus a shrinkage penalty λ times the sum of squared coefficients. It shrinks the coefficient estimates towards zero, reducing their variance.
10
New cards

The Lasso

[Shrinkage] it shrinks coefficient estimates towards zero. However, it uses an L1 penalty (λ times the sum of the absolute values of the coefficients). This forces some coefficient estimates to be exactly zero when λ is sufficiently large, resulting in variable selection and sparse models.

11
New cards
Selecting the Tuning Parameter
[Shrinkage] Cross-validation is used to select the tuning parameter λ for both ridge regression and the Lasso. A grid of λ values is chosen and the cross-validation error rate is computed for each value.
12
New cards
Principal Components Regression (PCR)
[Dimension Reduction] This uses principal component analysis (PCA) to define linear combinations of the predictors. The first principal component is the linear combination with the largest variance, and subsequent components have the largest variance subject to being uncorrelated with the previous components.
13
New cards

Partial Least Squares (PLS)

[Dimension Reduction] is a dimension reduction method that identifies a new set of features that are linear combinations of the original features. However, it identifies these new features in a supervised way, using the response variable to identify features that are related to both the old features and the response.