12: Decision trees

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/4

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

5 Terms

New cards

What are advantages of decision trees?

Trees are very flexible and can accommodate different types of responses (quantitative as well as qualitative) resulting in regression respectively classification trees.

Also the different variables types as explanatory variables can be used. No specific form for the underlying relationship is assumed.

New cards

What is bagging?

1. Bagging (Bootstrap Aggregating)

Key Idea: Reduce variance by averaging multiple independent models trained on bootstrapped datasets.

How it works:
- Generate multiple bootstrapped samples from the original dataset.
- Train a separate model (usually decision trees) on each sample.
- Aggregate predictions (average for regression, majority vote for classification).
Purpose: Decrease overfitting and improve stability.
Strengths: Reduces variance, works well with high-variance models (e.g., deep decision trees).
Weaknesses: Does not improve bias significantly.

✅ Example: Bagging Decision Trees
Averaging multiple decision trees trained on different subsets of the data to make a more stable prediction.

New cards

What is random forest?

2. Random Forest

Key Idea: An extension of bagging that introduces additional randomness to reduce correlation among trees.

How it works:
- Same as bagging, but each tree is built using a random subset of features at each split (not all features).
- This further decorrelates the trees, making the ensemble more robust.
Purpose: Reduces both variance and correlation between trees.
Strengths: prevents overfitting better than individual decision trees.
Weaknesses: Can be computationally expensive for large datasets.

New cards

What does boosting do?

3. Boosting

Key Idea: Reduce both bias and variance by training models sequentially, where each new model focuses on the mistakes of the previous ones.

How it works:
- Train a weak model (e.g., a shallow tree).
- Identify misclassified instances and assign them higher weights.
- Train the next model to correct those mistakes.
- Combine models to make a final prediction.
Purpose: Reduces bias, improves predictive power, and works well with weak learners.
Strengths: High accuracy, especially for complex datasets.
Weaknesses: More prone to overfitting than bagging/random forest, sensitive to noise.

New cards

When do you use bagging. random forest or boosting?

Bagging → When you have a high-variance model (e.g., deep decision trees) and want to stabilize predictions.
Random Forest → When you need a strong, robust model that handles high-dimensional data with reduced correlation.
Boosting → When you need the highest accuracy and can afford careful tuning to avoid overfitting.