05-10 Ridge

= shrinkage methods

prevent overfitting in models by adding information and constraints to estimators.
constrain the range of values an estimator can take → decrease the model's variance, thus stabilizing predictions and improving generalization.

By restricting the values that regression coefficients can take, we can control their size and, in turn, manage the model's variance.
The goal is to reduce the flexibility of the model while still maintaining a good fit to the data, preventing overfitting.

introduces penalty term to the sum of the squares of the regression coefficients (λ * Σ(β_j^2)).

penalty term helps to keep the coefficient sizes small, which prevents them from becoming excessively large, thus controlling variance.
The tuning parameter (λ) adjusts the strength of this penalty and must be chosen carefully to optimize model performance.

Ridge regression does not yield coefficients that are exactly zero, providing an advantage of using all predictors.
As the value of λ increases, the model emphasizes the significance of keeping coefficients small, promoting stability.
A larger λ leads to a greater penalty, which reduces the coefficients more aggressively.

Increasing λ tends to reduce variance effectively but may increase bias. Finding the optimal λ where MSE is minimized is ideal.
Use cross-validation to determine the best λ by evaluating performance across different training splits.

Predictors should be standardized to have a mean of zero and a standard deviation of one prior to ridge regression to ensure that coefficients are comparable.
Standardization prevents distortion of results when predictors are on different scales, which could lead to misleading interpretations.

Although ridge regression effectively controls variance, coefficients never reach zero, making it less interpretable in terms of featuring selection.
Lasso regression modifies the ridge regression approach by using a different penalty, which can lead to some coefficients being exactly zero, hence inherently providing variable selection.
Understanding the distinctions between ridge and lasso helps in making informed decisions about model choice based on bias, variance, and interpretation needs.