1/16
These flashcards cover key concepts and methods related to shrinkage approaches in regression analysis, focusing on ridge regression and the lasso.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
What is the main purpose of shrinkage methods in regression?
To constrain or regularize coefficient estimates, reducing their variance by shrinking them towards zero.
Name the two best-known techniques for shrinking regression coefficients towards zero.
Ridge regression and the lasso.
What is the key difference between ridge regression and least squares regression?
Ridge regression adds a shrinkage penalty term to the least squares objective function to estimate coefficients.
What does the tuning parameter ϵ control in ridge regression?
It controls the trade-off between fitting the data well (minimizing RSS) and shrinking the coefficients towards zero.
What happens to the estimates of coefficients in ridge regression as ϵ approaches 0?
The estimates approach the least squares estimates.
In ridge regression, what is the penalty term added to the RSS?
The penalty term is ϵ∑(λj^2), which is called a shrinkage penalty.
Why doesn’t ridge regression shrink the intercept (λ0)?
Because we want to retain the estimated association of the response variable when all predictors are zero.
What does the lasso do differently than ridge regression?
The lasso can set some coefficients exactly to zero, thus performing variable selection.
Which regularization technique is best for model interpretation?
The lasso, as it results in sparse models involving only a subset of predictors.
What type of penalty does the lasso use?
An L1 (β1) penalty, which is the sum of the absolute values of the coefficients.
In terms of bias-variance trade-off, what advantage does ridge regression provide?
As ϵ increases, ridge regression decreases variance at the expense of introducing some bias.
Why can ridge regression perform well when the number of predictors p is large?
Because it trades off a small increase in bias for a large decrease in variance, avoiding extreme variances in estimates.
What does it mean to standardize predictors before performing ridge regression?
To transform predictors so they all have a standard deviation of one, making the fit invariant to the scale of measurement.
How do ridge regression and lasso's approach to feature selection differ?
Ridge regression includes all variables in the model, while the lasso may exclude some by setting coefficients to zero.
How does cross-validation help in the context of ridge regression and the lasso?
It helps select the optimal tuning parameter ϵ by comparing cross-validation errors across different values.
What is the computational benefit of ridge regression over best subset selection?
Ridge regression only fits a single model for each value of ϵ rather than searching through 2^p models.
What Bayesian interpretation can be applied to ridge regression and the lasso?
Ridge regression corresponds to a Gaussian prior on coefficients, while the lasso corresponds to a double-exponential prior.