Lecture 4: Shrinkage Methods

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/69

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 3:38 PM on 2/19/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

70 Terms

1
New cards

What are Ridge and Lasso alternatives to?

Regression

2
New cards

Are the Ridge/Lasso estimators biased?

Yes

3
New cards

Do Ridge/Lasso or regression estimates have a smaller mse?

Often Ridge/Lasso is lower

4
New cards

What type of method are Ridge and Lasso?

Shrinkage methods

5
New cards

What do we mean by shrinkage methods?

Reduces the size of the set of estimated coefficients compared to a regression

6
New cards

How does the shrinking of estimates differ between Ridge and Lasso?

Ridge shrinks the estimates but does not set any to zero; Lasso shrinks the estimates and sets some to zero

7
New cards

Which of Ridge and Lasso is often preferred for model selection?

Lasso

8
New cards
<p>In this example, how do we generate the explanatory and dependent variables?</p>

In this example, how do we generate the explanatory and dependent variables?

knowt flashcard image
9
New cards

How do we use the data to obtain the mse for each model?

knowt flashcard image
10
New cards
<p>What do the mses from each model tell us?</p>

What do the mses from each model tell us?

knowt flashcard image
11
New cards

For Ridge, how many observations and explanatory variables do we have?

N observations; p explanatory variables

12
New cards

How does the OLS estimator choose the parameters β0, β1, …, βp? What does it aim to do with them?

knowt flashcard image
13
New cards

How does the Ridge estimator choose the parameters β0, β1, …, βp? What does it aim to do with them?

knowt flashcard image
14
New cards

What is λ representing?

Tuning parameter

15
New cards
<p>What is this known as and why?</p>

What is this known as and why?

L2 (shrinkage) penalty; penalises the squared values of β

16
New cards

How do we choose the value of λ?

Cross-validation (kFCV)

17
New cards

What does the shrinkage penalty not apply to?

The intercept, β0

18
New cards

What does the shrinkage penalty do to the estimated Ridge parameters?

Make them smaller than the estimated OLS parameters; however individual Ridge estimated parameters can be larger than their OLS counterparts

19
New cards

When λ = 0, how to the Ridge and OLS estimates compare?

Ridge chooses parameters to minimise RSS; Ridge and OLS estimates are the same

20
New cards

When λ → ∞, how to the Ridge and OLS estimates compare?

<p></p>
21
New cards

How can we express the observations in a matrix?

knowt flashcard image
22
New cards

How can we express the observations of y in a vector?

knowt flashcard image
23
New cards

How can we express the identity matrix of p+1?

knowt flashcard image
24
New cards

How can we then express the OLS and Ridge estimators?

knowt flashcard image
25
New cards
<p>What factor results in the Ridge estimator being smaller?</p>

What factor results in the Ridge estimator being smaller?

The inverted term is larger with Ridge than with OLS

26
New cards

How might the ridge estimation look on a graph?

knowt flashcard image
27
New cards
<p>What do the red ellipses represent?</p>

What do the red ellipses represent?

Contours with the same value of RSS

28
New cards
<p>Where do we see the lowest value of RSS?</p>

Where do we see the lowest value of RSS?

At ˆβ; these are the OLS estimates

29
New cards
<p>As we move away from ˆβ, what happens to the RSS?</p>

As we move away from ˆβ, what happens to the RSS?

RSS increases

30
New cards
<p>What is the constraint that represents the blue circle?</p>

What is the constraint that represents the blue circle?

knowt flashcard image
31
New cards
<p>Where do we find the Ridge estimator?</p>

Where do we find the Ridge estimator?

Where the blue circle is tangential to the red ellipses

32
New cards
<p>When does the tangency occur regarding parameter values?</p>

When does the tangency occur regarding parameter values?

When both parameters are positive

33
New cards
<p>What does Ridge do to the values of the parameters relative to the OLS estimates?</p>

What does Ridge do to the values of the parameters relative to the OLS estimates?

Reduces the values of the parameters but does not set any to zero

34
New cards

How does the Lasso estimator choose the parameters β0, β1, …, βp? What does it aim to do with them?

knowt flashcard image
35
New cards
<p>What is this known as and why?</p>

What is this known as and why?

L1 penalty; penalises the values of β

36
New cards

What does each value of λ give?

A different Lasso model, similar to in Ridge

37
New cards

How do we express the Lasso estimator?

No simple expression unlike Ridge

38
New cards

What would Lasso look like on a graph for two explanatory variables?

knowt flashcard image
39
New cards
<p>What is the constraint that represents the blue diamond?</p>

What is the constraint that represents the blue diamond?

knowt flashcard image
40
New cards
<p>At what point is the Lasso estimator obtained?</p>

At what point is the Lasso estimator obtained?

Where the blue diamond is tangential to the red ellipses

41
New cards
<p>When does tangency occur regarding the values of parameters?</p>

When does tangency occur regarding the values of parameters?

When one of the parameters is zero

42
New cards

What happens to the parameters that are equal to zero?

Eliminates some explanatory variables from the model; shrinkage penalty may shrink some parameters to zero

43
New cards

What is the equation for mse using variance and bias?

knowt flashcard image
44
New cards

Given Ridge and Lasso are biased estimates, how can they achieve a lower mse?

Using a lower variance

45
New cards

OLS is prone to overfitting. What is one symptom of overfitting by OLS and when does it occur?

knowt flashcard image
46
New cards

How does Ridge and Lasso counteract the tendency of OLS to overfit and then obtain a lower mse?

Penalises large coefficients, allowing for better predictions of the test data which obtains a lower mse

47
New cards

When is the tendency to overfit especially marked?

When there is multicollinearity in the training data

48
New cards

In R, how do we prepare the data for Ridge estimation?

knowt flashcard image
49
New cards

How do we create the matrix of observations on explanatory variables and what has it done to the factors?

knowt flashcard image
50
New cards

How do we generate the dependent variable y?

knowt flashcard image
51
New cards

How do we generate a grid?

knowt flashcard image
52
New cards

What are the grid values used for?

Different values of the tuning parameter

53
New cards

How do we split the data into training and test data?

knowt flashcard image
54
New cards

How do we run the Ridge models?

knowt flashcard image
55
New cards
<p>When λ is very small (log λ is negative), what estimates do we obtain?</p>

When λ is very small (log λ is negative), what estimates do we obtain?

OLS estimates

56
New cards
<p>What happens as λ increases?</p>

What happens as λ increases?

The estimated coefficients get smaller

57
New cards
<p>What happens as λ → ∞?</p>

What happens as λ → ∞?

Estimated coefficients become zero (except the constant)

58
New cards

How do we obtain the mse for when λ = 0?

knowt flashcard image
59
New cards

How does the mse differ for λ = 1010 and λ = 10? What does it tell us?

knowt flashcard image
60
New cards

How do we find the optimal value of λ using cross-validation?

knowt flashcard image
61
New cards

What does the distribution of mse for each value of log(λ) look like?

knowt flashcard image
62
New cards

How can we find the mse for this optimal λ?

knowt flashcard image
63
New cards

How do we find the estimated coefficients for the optimal λ?

knowt flashcard image
64
New cards

How do we estimate the models using Lasso?

knowt flashcard image
65
New cards

What does the plot of estimated coefficients for each value of logλ look like for Lasso?

knowt flashcard image
66
New cards

How do we obtain the optimal value of λ for Lasso?

knowt flashcard image
67
New cards

What does the plot of mse and logλ look like?

knowt flashcard image
68
New cards

How do we obtain the mse for the optimal λ in Lasso?

knowt flashcard image
69
New cards

How can the estimates coefficients for the optimal λ be found for Lasso?

knowt flashcard image
70
New cards
<p>What can we infer from the Lasso coefficients at the optimal λ?</p>

What can we infer from the Lasso coefficients at the optimal λ?

knowt flashcard image