ECON 491 Lecture 14: Shrinkage/Regularization

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/9

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

10 Terms

1
New cards

Shrinkage Methods (also known as Regularization)

Use all p predictors to fit a model. However, the estimated coefficients are shrunken (or constrained or regularized) towards zero relative the least squares estimates.

**With some shrinkage methods, some of the coefficients may be estimated to be exactly zero; shrinkage methods can perform variable selection

2
New cards

Two popular shrinkage techniques:

→ Ridge Regression

→ Lasso Regression

3
New cards
<p>Ridge Regression</p>

Ridge Regression

→ Ridge regression minimizes both RSS and the term λ (Sigma) p^j=1 β2 j (*This term is called shrinkage penalty)

4
New cards
<p>What is the shrinkage penalty?</p>

What is the shrinkage penalty?

It gets small when β1, . . . , βp are close to zero—it forces estimates closer to zero.

→ The tuning parameter λ ≥ 0 controls the relative impact of the RSS and the shrinkage penalty.

→ When λ = 0, the penalty has no effect, the ridge regression will produce the least squares estimates.

→ When λ → ∞, the penalty grows and the ridge regression will produce estimates closer to zero.

—CV is used to select a good value for λ.

***Note: The shrinkage penalty does not apply to intercept β0; it applies only to β1, . . . , βp.

5
New cards
term image
knowt flashcard image
6
New cards
<p>(ONE SIDED)</p>

(ONE SIDED)

7
New cards

The LASSO (Least Absolute Shrinkage and Selection Operator)

→ Ridge regression will include all p predictors in the final model which is a disadvantage for interpretation. The Lasso overcomes this problem

→As with ridge regression, the lasso shrinks the coefficient estimates towards zero; —However in the case of the Lasso, the L1 penalty has the effect of forcing some of the coefficient estimates to be exactly equal to zero when the tuning parameter λ is sufficiently large.

→ Much like best subset selection, the lasso performs variable selection

→ As in ridge regression, selecting a good value of λ for the lasso is critical; CV again is the method of choice.

<p>→ Ridge regression will include <u>all</u> <em>p</em> predictors in the final model which is a <u>disadvantage</u> for interpretation. The Lasso overcomes this problem</p><p>→As with ridge regression, the lasso shrinks the coefficient estimates towards zero; —However in the case of the Lasso, the L1 penalty has the effect of forcing some of the coefficient estimates to be <u>exactly equal to zero</u> when the tuning parameter λ is sufficiently large.</p><p>→ Much like best subset selection, the lasso performs variable selection</p><p>→ As in ridge regression, selecting a good value of λ for the lasso is critical; CV again is the method of choice.</p>
8
New cards
<p>Comparing the Lasso and the Ridge regression (ONE SIDED)</p>

Comparing the Lasso and the Ridge regression (ONE SIDED)

9
New cards
<p>Comparing the Lasso and the Ridge Regression C’td</p>

Comparing the Lasso and the Ridge Regression C’td

→ These two examples illustrate that neither ridge nor lasso will universally dominate the other

→ In general, one might expect the lasso to perform better when the response is a function of only a relatively small number of predictors

→ However, the number of predictors that is related to the response is never known a priori for real data sets

→ A technique such as cross-validation can be used in order to determine which approach is better on a particular data set

<p>→ These two examples illustrate that neither ridge nor lasso will universally dominate the other</p><p>→ In general, one might expect the lasso to perform better when the response is a function of only a relatively small number of predictors</p><p>→ However, the number of predictors that is related to the response is never known a priori for real data sets</p><p>→ A technique such as cross-validation can be used in order to determine which approach is better on a particular data set</p>
10
New cards

Selecting the Tuning Parameter λ for Ridge Regression and Lasso

→ CV is used

→ We choose a grid of λ values, and compute the CV error rate for each value of λ.

→ Then select the tuning parameter value for which the CV error is smallest

→ Finally, the model is re-fit using all of the available observations and the selected value of the tuning parameter.