05-10 Ridge
Regularization
= shrinkage methods
prevent overfitting in models by adding information and constraints to estimators.
constrain the range of values an estimator can take → decrease the model's variance, thus stabilizing predictions and improving generalization.
Understanding Regularization Concepts
Mechanics of Regularization Techniques
Shrinkage through Constraints
By restricting the values that regression coefficients can take, we can control their size and, in turn, manage the model's variance.
The goal is to reduce the flexibility of the model while still maintaining a good fit to the data, preventing overfitting.


Ridge Regression
introduces penalty term to the sum of the squares of the regression coefficients (λ * Σ(β_j^2)).
penalty term helps to keep the coefficient sizes small, which prevents them from becoming excessively large, thus controlling variance.
The tuning parameter (λ) adjusts the strength of this penalty and must be chosen carefully to optimize model performance.
Key Concepts of Ridge Regression
Ridge regression does not yield coefficients that are exactly zero, providing an advantage of using all predictors.
As the value of λ increases, the model emphasizes the significance of keeping coefficients small, promoting stability.
A larger λ leads to a greater penalty, which reduces the coefficients more aggressively.
Bias-Variance Trade-off in Ridge Regression
Increasing λ tends to reduce variance effectively but may increase bias. Finding the optimal λ where MSE is minimized is ideal.
Use cross-validation to determine the best λ by evaluating performance across different training splits.

Importance of Standardization
Predictors should be standardized to have a mean of zero and a standard deviation of one prior to ridge regression to ensure that coefficients are comparable.
Standardization prevents distortion of results when predictors are on different scales, which could lead to misleading interpretations.

Limitations and Transition to Lasso
Although ridge regression effectively controls variance, coefficients never reach zero, making it less interpretable in terms of featuring selection.
Lasso regression modifies the ridge regression approach by using a different penalty, which can lead to some coefficients being exactly zero, hence inherently providing variable selection.
Understanding the distinctions between ridge and lasso helps in making informed decisions about model choice based on bias, variance, and interpretation needs.