L8 - Tree Ensemble Methods

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/71

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 7:55 PM on 4/14/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

72 Terms

1
New cards

What is an ensemble method?

A method that combines multiple models to produce a stronger overall model.

2
New cards

What are weak learners?

Simple models that perform moderately well on their own.

3
New cards

What is the intuition behind ensemble methods?

Combining diverse independent models improves prediction accuracy.

4
New cards

What is the “wisdom of the crowds” idea?

Aggregated predictions from many models are more accurate than individual predictions.

5
New cards

Why are decision trees commonly used in ensembles?

They have low bias but high variance and capture complex interactions.

6
New cards

What are the three main ensemble methods?

Bagging, random forests, and boosting.

7
New cards

Why do decision trees have high variance?

Small changes in data can produce very different trees.

8
New cards

What is the key idea behind bagging?

Reduce variance by averaging many independently trained models.

9
New cards

What does bagging stand for?

Bootstrap aggregation.

10
New cards

What is the bootstrap?

A resampling method that samples observations with replacement.

11
New cards

Why is bootstrap useful?

It approximates sampling from the population using one dataset.

12
New cards

How are bootstrap samples constructed?

By randomly sampling observations with replacement from the dataset.

13
New cards

Why can observations repeat in bootstrap samples?

Sampling is done with replacement.

14
New cards

What is the purpose of bootstrap in bagging?

To generate multiple training datasets.

15
New cards

What happens after generating bootstrap samples in bagging?

A model is trained on each sample.

16
New cards

How are predictions combined in bagging for regression?

By averaging predictions.

17
New cards

How are predictions combined in bagging for classification?

By majority voting.

18
New cards

Why does averaging reduce variance?

Averaging independent estimates reduces variability.

19
New cards

What type of trees are used in bagging?

Deep, unpruned trees.

20
New cards

Why use deep trees in bagging?

They have low bias but high variance, which averaging reduces.

21
New cards

What is the main benefit of bagging?

Variance reduction without increasing bias.

22
New cards

What is a limitation of bagging?

Loss of interpretability due to many trees.

23
New cards

What is out-of-bag (OOB) data?

Observations not included in a bootstrap sample.

24
New cards

What proportion of data is typically OOB?

About one-third of observations.

25
New cards

How is OOB error estimated?

By predicting each observation using trees where it was not included.

26
New cards

Why is OOB error useful?

It provides validation without a separate test set.

27
New cards

What is the relationship between OOB error and cross-validation?

OOB approximates leave-one-out cross-validation.

28
New cards

What is variable importance in bagging?

A measure of how much each variable reduces prediction error.

29
New cards

How is variable importance computed?

By averaging reduction in error across trees.

30
New cards

What does a high variable importance indicate?

The predictor strongly influences predictions.

31
New cards

Why does bagging reduce interpretability?

Because results come from many aggregated trees.

32
New cards

What is the limitation of bagging regarding correlation?

Trees can be highly correlated if strong predictors dominate.

33
New cards

Why is correlation between trees a problem?

It reduces the effectiveness of variance reduction.

34
New cards

What is a random forest?

An extension of bagging that decorrelates trees.

35
New cards

How do random forests reduce correlation?

By selecting a random subset of predictors at each split.

36
New cards

What is the parameter m in random forests?

Number of predictors considered at each split.

37
New cards

What is a typical choice for m?

Square root of total predictors.

38
New cards

What happens if m equals total predictors?

Random forest becomes equivalent to bagging.

39
New cards

What is the key advantage of random forests over bagging?

Lower correlation between trees leading to better variance reduction.

40
New cards

Does each tree in random forests use all predictors?

Yes, but only a subset is considered at each split.

41
New cards

How does randomness improve performance in random forests?

It produces more diverse trees.

42
New cards

What happens to bias in random forests?

Slight increase compared to bagging.

43
New cards

What happens to variance in random forests?

Reduced compared to bagging.

44
New cards

What is boosting?

An ensemble method that builds models sequentially.

45
New cards

How does boosting differ from bagging?

Models are built sequentially instead of independently.

46
New cards

What type of trees are used in boosting?

Shallow trees.

47
New cards

Why are shallow trees used in boosting?

To control complexity and reduce overfitting.

48
New cards

What is the key idea of boosting?

Learn from previous errors and improve predictions iteratively.

49
New cards

What does boosting focus on at each step?

Residual errors from previous models.

50
New cards

What are residuals in boosting?

Differences between observed and predicted values.

51
New cards

How does boosting update predictions?

By adding new models that correct previous errors.

52
New cards

What is the shrinkage parameter?

A parameter controlling learning rate in boosting.

53
New cards

Why is a small shrinkage parameter used?

To ensure slow and stable learning.

54
New cards

What happens if shrinkage is too large?

Model may overfit quickly.

55
New cards

What happens if shrinkage is small?

Requires more trees but improves generalization.

56
New cards

What is the number of trees in boosting?

Number of sequential models added.

57
New cards

Can boosting overfit?

Yes, if too many trees are used.

58
New cards

What is the depth parameter in boosting?

Controls complexity of individual trees.

59
New cards

What does depth represent?

Maximum number of splits per tree.

60
New cards

What is the effect of depth on model?

Higher depth increases interaction complexity.

61
New cards

What is gradient boosting?

A boosting method that fits trees to gradients of the loss function.

62
New cards

What are pseudo-residuals?

Values representing model errors used as targets.

63
New cards

What is the role of loss function in boosting?

Defines what errors are minimized.

64
New cards

How does boosting reduce bias?

By iteratively improving model predictions.

65
New cards

What is stochastic gradient boosting?

A variant that uses random subsamples of data.

66
New cards

Why use subsampling in boosting?

To reduce variance and improve efficiency.

67
New cards

What is a typical subsample size?

About half of the data.

68
New cards

What is XGBoost?

An advanced implementation of gradient boosting.

69
New cards

Why is XGBoost popular?

It is efficient and performs well in practice.

70
New cards

Bias-variance tradeoff in bagging?

Reduces variance with little change in bias.

71
New cards

Bias-variance tradeoff in random forests?

Further reduces variance with slight increase in bias.

72
New cards

Bias-variance tradeoff in boosting?

Primarily reduces bias but can increase variance if overfit.