Lecture 4: Shrinkage Methods

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/69

There's no tags or description

Looks like no tags are added yet.

Last updated 3:38 PM on 2/19/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai	Chat

No analytics yet

Send a link to your students to track their progress

70 Terms

New cards

What are Ridge and Lasso alternatives to?

Regression

New cards

Are the Ridge/Lasso estimators biased?

Yes

New cards

Do Ridge/Lasso or regression estimates have a smaller mse?

Often Ridge/Lasso is lower

New cards

What type of method are Ridge and Lasso?

Shrinkage methods

New cards

What do we mean by shrinkage methods?

Reduces the size of the set of estimated coefficients compared to a regression

New cards

How does the shrinking of estimates differ between Ridge and Lasso?

Ridge shrinks the estimates but does not set any to zero; Lasso shrinks the estimates and sets some to zero

New cards

Which of Ridge and Lasso is often preferred for model selection?

Lasso

New cards

In this example, how do we generate the explanatory and dependent variables?

New cards

How do we use the data to obtain the mse for each model?

New cards

What do the mses from each model tell us?

New cards

For Ridge, how many observations and explanatory variables do we have?

N observations; p explanatory variables

New cards

How does the OLS estimator choose the parameters β₀, β₁, …, β_p? What does it aim to do with them?

New cards

How does the Ridge estimator choose the parameters β₀, β₁, …, β_p? What does it aim to do with them?

New cards

What is λ representing?

Tuning parameter

New cards

What is this known as and why?

L2 (shrinkage) penalty; penalises the squared values of β

New cards

How do we choose the value of λ?

Cross-validation (kFCV)

New cards

What does the shrinkage penalty not apply to?

The intercept, β₀

New cards

What does the shrinkage penalty do to the estimated Ridge parameters?

Make them smaller than the estimated OLS parameters; however individual Ridge estimated parameters can be larger than their OLS counterparts

New cards

When λ = 0, how to the Ridge and OLS estimates compare?

Ridge chooses parameters to minimise RSS; Ridge and OLS estimates are the same

New cards

When λ → ∞, how to the Ridge and OLS estimates compare?

New cards

How can we express the observations in a matrix?

New cards

How can we express the observations of y in a vector?

New cards

How can we express the identity matrix of p+1?

New cards

How can we then express the OLS and Ridge estimators?

New cards

What factor results in the Ridge estimator being smaller?

The inverted term is larger with Ridge than with OLS

New cards

How might the ridge estimation look on a graph?

New cards

What do the red ellipses represent?

Contours with the same value of RSS

New cards

Where do we see the lowest value of RSS?

At ˆβ; these are the OLS estimates

New cards

As we move away from ˆβ, what happens to the RSS?

RSS increases

New cards

What is the constraint that represents the blue circle?

New cards

Where do we find the Ridge estimator?

Where the blue circle is tangential to the red ellipses

New cards

When does the tangency occur regarding parameter values?

When both parameters are positive

New cards

What does Ridge do to the values of the parameters relative to the OLS estimates?

Reduces the values of the parameters but does not set any to zero

New cards

How does the Lasso estimator choose the parameters β₀, β₁, …, β_p? What does it aim to do with them?

New cards

What is this known as and why?

L1 penalty; penalises the values of β

New cards

What does each value of λ give?

A different Lasso model, similar to in Ridge

New cards

How do we express the Lasso estimator?

No simple expression unlike Ridge

New cards

What would Lasso look like on a graph for two explanatory variables?

New cards

What is the constraint that represents the blue diamond?

New cards

At what point is the Lasso estimator obtained?

Where the blue diamond is tangential to the red ellipses

New cards

When does tangency occur regarding the values of parameters?

When one of the parameters is zero

New cards

What happens to the parameters that are equal to zero?

Eliminates some explanatory variables from the model; shrinkage penalty may shrink some parameters to zero

New cards

What is the equation for mse using variance and bias?

New cards

Given Ridge and Lasso are biased estimates, how can they achieve a lower mse?

Using a lower variance

New cards

OLS is prone to overfitting. What is one symptom of overfitting by OLS and when does it occur?

New cards

How does Ridge and Lasso counteract the tendency of OLS to overfit and then obtain a lower mse?

Penalises large coefficients, allowing for better predictions of the test data which obtains a lower mse

New cards

When is the tendency to overfit especially marked?

When there is multicollinearity in the training data

New cards

In R, how do we prepare the data for Ridge estimation?

New cards

How do we create the matrix of observations on explanatory variables and what has it done to the factors?

New cards

How do we generate the dependent variable y?

New cards

How do we generate a grid?

New cards

What are the grid values used for?

Different values of the tuning parameter

New cards

How do we split the data into training and test data?

New cards

How do we run the Ridge models?

New cards

When λ is very small (log λ is negative), what estimates do we obtain?

OLS estimates

New cards

What happens as λ increases?

The estimated coefficients get smaller

New cards

What happens as λ → ∞?

Estimated coefficients become zero (except the constant)

New cards

How do we obtain the mse for when λ = 0?

New cards

How does the mse differ for λ = 10¹⁰ and λ = 10? What does it tell us?

New cards

How do we find the optimal value of λ using cross-validation?

New cards

What does the distribution of mse for each value of log(λ) look like?

New cards

How can we find the mse for this optimal λ?

New cards

How do we find the estimated coefficients for the optimal λ?

New cards

How do we estimate the models using Lasso?

New cards

What does the plot of estimated coefficients for each value of logλ look like for Lasso?

New cards

How do we obtain the optimal value of λ for Lasso?

New cards

What does the plot of mse and logλ look like?

New cards

How do we obtain the mse for the optimal λ in Lasso?

New cards

How can the estimates coefficients for the optimal λ be found for Lasso?

New cards

What can we infer from the Lasso coefficients at the optimal λ?