Chapter 6: Model Selection

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/12

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

13 Terms

New cards

Subset Selection

[Model Selection] This involves identifying a subset of the predictors believed to be related to the response and then fitting a model using least squares on this reduced set of variables.

New cards

Shrinkage (Regularization)

[Model Selection] This approach fits a model using all predictors but shrinks the coefficient estimates towards zero relative to the least squares estimates. This reduces variance and can also perform variable selection.

New cards

Dimension Reduction

[Model Selection] This projects the p predictors into an M-dimensional subspace, where M < p, by computing M linear combinations or projections of the variables, which are then used as predictors in a linear regression model fitted by least squares.

New cards

Best Subset Selection

[Subset Selection] This method fits all possible models containing subsets of the predictors. It starts with a null model (M0) and then, for each k from 1 to p, fits all (p choose k) models with exactly k predictors and selects the best model (M_k) based on the smallest Residual Sum of Squares (RSS) or largest R2.

New cards

Stepwise Selection

[Subset Selection] These methods offer a computationally efficient alternative to best subset selection, especially when p is large.

New cards

Forward Stepwise Selection

[Subset Selection] Starts with the null model and adds predictors one at a time, choosing the variable that gives the greatest improvement to the fit at each step. It is not guaranteed to find the best possible model.

New cards

Backward Stepwise Selection

[Subset Selection] Starts with the full model and iteratively removes the least useful predictor one at a time. It requires n > p.

New cards

Choosing the Optimal Model

[Subset Selection] The model with the smallest RSS and the largest R2 will always contain all of the predictors. However, these metrics are based on training error, which is a poor estimate of test error. Test error can be estimated directly using validation and cross-validation or indirectly by adjusting the training error using metrics like Cp, AIC, BIC, and adjusted R2.

New cards

Ridge Regression

[Shrinkage] This method minimizes RSS plus a shrinkage penalty λ times the sum of squared coefficients. It shrinks the coefficient estimates towards zero, reducing their variance.

New cards

The Lasso

[Shrinkage] it shrinks coefficient estimates towards zero. However, it uses an L1 penalty (λ times the sum of the absolute values of the coefficients). This forces some coefficient estimates to be exactly zero when λ is sufficiently large, resulting in variable selection and sparse models.

New cards

Selecting the Tuning Parameter

[Shrinkage] Cross-validation is used to select the tuning parameter λ for both ridge regression and the Lasso. A grid of λ values is chosen and the cross-validation error rate is computed for each value.

New cards

Principal Components Regression (PCR)

[Dimension Reduction] This uses principal component analysis (PCA) to define linear combinations of the predictors. The first principal component is the linear combination with the largest variance, and subsequent components have the largest variance subject to being uncorrelated with the previous components.

New cards

Partial Least Squares (PLS)

[Dimension Reduction] is a dimension reduction method that identifies a new set of features that are linear combinations of the original features. However, it identifies these new features in a supervised way, using the response variable to identify features that are related to both the old features and the response.