Shrinkage Methods

0.0(0)
studied byStudied by 0 people
0.0(0)
linked notesView linked note
full-widthCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/16

flashcard set

Earn XP

Description and Tags

These flashcards cover key concepts and methods related to shrinkage approaches in regression analysis, focusing on ridge regression and the lasso.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

17 Terms

1
New cards

What is the main purpose of shrinkage methods in regression?

To constrain or regularize coefficient estimates, reducing their variance by shrinking them towards zero.

2
New cards

Name the two best-known techniques for shrinking regression coefficients towards zero.

Ridge regression and the lasso.

3
New cards

What is the key difference between ridge regression and least squares regression?

Ridge regression adds a shrinkage penalty term to the least squares objective function to estimate coefficients.

4
New cards

What does the tuning parameter ϵ control in ridge regression?

It controls the trade-off between fitting the data well (minimizing RSS) and shrinking the coefficients towards zero.

5
New cards

What happens to the estimates of coefficients in ridge regression as ϵ approaches 0?

The estimates approach the least squares estimates.

6
New cards

In ridge regression, what is the penalty term added to the RSS?

The penalty term is ϵ∑(λj^2), which is called a shrinkage penalty.

7
New cards

Why doesn’t ridge regression shrink the intercept (λ0)?

Because we want to retain the estimated association of the response variable when all predictors are zero.

8
New cards

What does the lasso do differently than ridge regression?

The lasso can set some coefficients exactly to zero, thus performing variable selection.

9
New cards

Which regularization technique is best for model interpretation?

The lasso, as it results in sparse models involving only a subset of predictors.

10
New cards

What type of penalty does the lasso use?

An L1 (β1) penalty, which is the sum of the absolute values of the coefficients.

11
New cards

In terms of bias-variance trade-off, what advantage does ridge regression provide?

As ϵ increases, ridge regression decreases variance at the expense of introducing some bias.

12
New cards

Why can ridge regression perform well when the number of predictors p is large?

Because it trades off a small increase in bias for a large decrease in variance, avoiding extreme variances in estimates.

13
New cards

What does it mean to standardize predictors before performing ridge regression?

To transform predictors so they all have a standard deviation of one, making the fit invariant to the scale of measurement.

14
New cards

How do ridge regression and lasso's approach to feature selection differ?

Ridge regression includes all variables in the model, while the lasso may exclude some by setting coefficients to zero.

15
New cards

How does cross-validation help in the context of ridge regression and the lasso?

It helps select the optimal tuning parameter ϵ by comparing cross-validation errors across different values.

16
New cards

What is the computational benefit of ridge regression over best subset selection?

Ridge regression only fits a single model for each value of ϵ rather than searching through 2^p models.

17
New cards

What Bayesian interpretation can be applied to ridge regression and the lasso?

Ridge regression corresponds to a Gaussian prior on coefficients, while the lasso corresponds to a double-exponential prior.