1/53
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Why do we need alternatives to least squares?
To improve prediction accuracy and interpretability when many predictors exist.
When does least squares perform well?
When the number of observations is much larger than the number of predictors.
What happens when predictors are close to the number of observations?
The model has high variance and can overfit.
What happens when predictors exceed observations?
There is no unique solution and variance becomes infinite.
What problem arises with many predictors?
Irrelevant variables reduce interpretability.
Why is interpretability important?
It helps identify which variables truly affect the response.
Why can’t least squares perform variable selection well?
It rarely sets coefficients exactly to zero.
What are the three main approaches to improve least squares?
Subset selection, shrinkage methods, and dimension reduction.
What is subset selection?
Choosing a subset of predictors to build the model.
What is best subset selection?
Testing all possible predictor combinations.
What is forward stepwise selection?
Adding predictors one at a time starting from none.
What is backward stepwise selection?
Removing predictors one at a time starting from all.
What is a drawback of subset selection?
It has high variance and can be computationally expensive.
What is multicollinearity?
When predictors are highly correlated.
What happens under multicollinearity?
Coefficients become unstable and vary greatly.
Why does multicollinearity increase variance?
Small data changes cause large coefficient changes.
What is shrinkage?
A method that reduces coefficient magnitudes toward zero.
Why use shrinkage?
To reduce variance and improve prediction accuracy.
What is the tradeoff in shrinkage?
Increased bias but reduced variance.
What is regularization?
Another term for shrinkage or penalization.
Why must predictors be standardized before shrinkage?
To ensure penalties apply equally across variables.
What is feature scaling?
Transforming variables to comparable scales.
What is ridge regression?
A shrinkage method that reduces coefficient size.
What happens to coefficients in ridge regression?
They shrink toward zero but never become exactly zero.
What does the tuning parameter control in ridge?
The strength of shrinkage.
What happens when the tuning parameter is zero?
Ridge becomes ordinary least squares.
What happens when the tuning parameter is very large?
Coefficients approach zero.
Why does ridge reduce overfitting?
It lowers variance by shrinking coefficients.
What is a limitation of ridge regression?
It keeps all predictors in the model.
What is the lasso?
A shrinkage method that can set coefficients exactly to zero.
Why is lasso useful?
It performs variable selection.
What type of models does lasso produce?
Sparse models with fewer predictors.
What is the key difference between ridge and lasso?
Lasso can eliminate variables while ridge cannot.
How does lasso improve interpretability?
By removing irrelevant predictors.
What happens to coefficients as the tuning parameter increases in lasso?
More coefficients shrink to zero.
What is a limitation of lasso with correlated predictors?
It may arbitrarily select one variable and drop others.
What is elastic net?
A method combining ridge and lasso penalties.
Why use elastic net?
To balance variable selection and handling correlated predictors.
What does elastic net control parameter do?
It determines the mix between ridge and lasso.
When is ridge preferred over lasso?
When many predictors have small effects.
When is lasso preferred over ridge?
When the true model is sparse.
What is cross-validation used for?
Selecting the optimal tuning parameter.
Why is tuning parameter selection important?
It determines model performance.
What is the bias-variance tradeoff?
Reducing variance increases bias and vice versa.
How does ridge affect bias and variance?
Increases bias but decreases variance.
Why can ridge outperform least squares?
It reduces test error when variance is high.
What is dimension reduction?
Transforming predictors into a smaller set of variables.
Why use dimension reduction?
To handle multicollinearity and high dimensionality.
What is the idea behind dimension reduction?
Combine information from predictors into new variables.
How is dimension reduction different from lasso?
It does not remove variables but transforms them.
What is principal components regression?
A method using principal components as predictors.
What is partial least squares?
A method that considers both predictors and response in reduction.
Why is dimension reduction useful with correlated predictors?
It preserves shared information instead of discarding variables.
What is the key benefit of regularization methods overall?
Improved prediction accuracy in high-dimensional settings.