1/24
Flashcards on Bias-Variance Tradeoff in Machine Learning
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What are the two primary sources of error that affect generalization in supervised machine learning?
Bias and variance.
Define bias in the context of machine learning models.
Bias refers to errors introduced by approximating a real-world problem with a simplified model.
What are the characteristics of a model with high bias?
Oversimplified assumptions, high training and test errors, and poor performance on both seen and unseen data.
Define variance in the context of machine learning models.
Variance refers to the model's sensitivity to fluctuations in the training data, leading to modeling of random noise.
What are the characteristics of a model with high variance?
Complex models with many parameters, low training error but high test error, and poor generalization to new data.
Explain the concept of underfitting in terms of bias and variance.
Underfitting is characterized by high bias and low variance.
Explain the concept of overfitting in terms of bias and variance.
Overfitting is characterized by low bias and high variance.
What is the key to achieving an optimal model in the bias-variance tradeoff?
Achieving a balance that minimizes total error.
How does regularization help in managing bias and variance?
Regularization adds a penalty to the loss function to discourage complex models, thus reducing variance.
How does L1 Regularization (Lasso) work?
Encourages sparsity, potentially eliminating some features.
How does L2 Regularization (Ridge) work?
Penalizes large coefficients, leading to smaller, more evenly distributed weights.
How does cross-validation help in managing bias and variance?
Cross-validation helps in assessing how the results of a statistical analysis will generalize to an independent dataset, aiding in detecting overfitting or underfitting and selecting appropriate model complexity.
How do ensemble methods help in balancing bias and variance?
Combining multiple models can help in balancing bias and variance.
How does Bagging (Bootstrap Aggregating) reduce variance?
By averaging predictions from multiple models trained on different subsets of the data.
How does Boosting reduce bias?
By sequentially training models, each trying to correct the errors of the previous one.
How does expanding training data help in managing bias and variance?
Increasing the size of the training dataset can help in reducing variance and improving generalization to unseen data.
What are the indicators of high bias?
High training error, high test error, and minimal improvement with more data.
What are the remedies for high bias?
Increase model complexity, add more relevant features, and reduce regularization.
What are the indicators of high variance?
Low training error, high test error, and performance improves with more data.
What are the remedies for high variance?
Simplify the model, increase regularization, use ensemble methods, and gather more training data.
In the context of the bias-variance tradeoff, what does the left side of the plot (high bias) represent?
Models that are too simple, leading to underfitting.
In the context of the bias-variance tradeoff, what does the right side of the plot (high variance) represent?
Models that are too complex, leading to overfitting.
What is the 'optimal point' in the bias-variance tradeoff plot?
A sweet spot in the middle where both bias and variance are minimized.
What is a key takeaway regarding the relationship between bias and variance?
There is a tradeoff between bias and variance; improving one often worsens the other.
Why is understanding and managing the bias-variance tradeoff crucial?
For developing robust machine learning models that generalize well to new, unseen data.