L4 - Regularization and Dimension Reduction

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/53

There's no tags or description

Looks like no tags are added yet.

Last updated 8:07 PM on 4/14/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

54 Terms

New cards

Why do we need alternatives to least squares?

To improve prediction accuracy and interpretability when many predictors exist.

New cards

When does least squares perform well?

When the number of observations is much larger than the number of predictors.

New cards

What happens when predictors are close to the number of observations?

The model has high variance and can overfit.

New cards

What happens when predictors exceed observations?

There is no unique solution and variance becomes infinite.

New cards

What problem arises with many predictors?

Irrelevant variables reduce interpretability.

New cards

Why is interpretability important?

It helps identify which variables truly affect the response.

New cards

Why can’t least squares perform variable selection well?

It rarely sets coefficients exactly to zero.

New cards

What are the three main approaches to improve least squares?

Subset selection, shrinkage methods, and dimension reduction.

New cards

What is subset selection?

Choosing a subset of predictors to build the model.

New cards

What is best subset selection?

Testing all possible predictor combinations.

New cards

What is forward stepwise selection?

Adding predictors one at a time starting from none.

New cards

What is backward stepwise selection?

Removing predictors one at a time starting from all.

New cards

What is a drawback of subset selection?

It has high variance and can be computationally expensive.

New cards

What is multicollinearity?

When predictors are highly correlated.

New cards

What happens under multicollinearity?

Coefficients become unstable and vary greatly.

New cards

Why does multicollinearity increase variance?

Small data changes cause large coefficient changes.

New cards

What is shrinkage?

A method that reduces coefficient magnitudes toward zero.

New cards

Why use shrinkage?

To reduce variance and improve prediction accuracy.

New cards

What is the tradeoff in shrinkage?

Increased bias but reduced variance.

New cards

What is regularization?

Another term for shrinkage or penalization.

New cards

Why must predictors be standardized before shrinkage?

To ensure penalties apply equally across variables.

New cards

What is feature scaling?

Transforming variables to comparable scales.

New cards

What is ridge regression?

A shrinkage method that reduces coefficient size.

New cards

What happens to coefficients in ridge regression?

They shrink toward zero but never become exactly zero.

New cards

What does the tuning parameter control in ridge?

The strength of shrinkage.

New cards

What happens when the tuning parameter is zero?

Ridge becomes ordinary least squares.

New cards

What happens when the tuning parameter is very large?

Coefficients approach zero.

New cards

Why does ridge reduce overfitting?

It lowers variance by shrinking coefficients.

New cards

What is a limitation of ridge regression?

It keeps all predictors in the model.

New cards

What is the lasso?

A shrinkage method that can set coefficients exactly to zero.

New cards

Why is lasso useful?

It performs variable selection.

New cards

What type of models does lasso produce?

Sparse models with fewer predictors.

New cards

What is the key difference between ridge and lasso?

Lasso can eliminate variables while ridge cannot.

New cards

How does lasso improve interpretability?

By removing irrelevant predictors.

New cards

What happens to coefficients as the tuning parameter increases in lasso?

More coefficients shrink to zero.

New cards

What is a limitation of lasso with correlated predictors?

It may arbitrarily select one variable and drop others.

New cards

What is elastic net?

A method combining ridge and lasso penalties.

New cards

Why use elastic net?

To balance variable selection and handling correlated predictors.

New cards

What does elastic net control parameter do?

It determines the mix between ridge and lasso.

New cards

When is ridge preferred over lasso?

When many predictors have small effects.

New cards

When is lasso preferred over ridge?

When the true model is sparse.

New cards

What is cross-validation used for?

Selecting the optimal tuning parameter.

New cards

Why is tuning parameter selection important?

It determines model performance.

New cards

What is the bias-variance tradeoff?

Reducing variance increases bias and vice versa.

New cards

How does ridge affect bias and variance?

Increases bias but decreases variance.

New cards

Why can ridge outperform least squares?

It reduces test error when variance is high.

New cards

What is dimension reduction?

Transforming predictors into a smaller set of variables.

New cards

Why use dimension reduction?

To handle multicollinearity and high dimensionality.

New cards

What is the idea behind dimension reduction?

Combine information from predictors into new variables.

New cards

How is dimension reduction different from lasso?

It does not remove variables but transforms them.

New cards

What is principal components regression?

A method using principal components as predictors.

New cards

What is partial least squares?

A method that considers both predictors and response in reduction.

New cards

Why is dimension reduction useful with correlated predictors?

It preserves shared information instead of discarding variables.

New cards

What is the key benefit of regularization methods overall?

Improved prediction accuracy in high-dimensional settings.