Lecture 8: Decision Trees and Random Forest 1

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/14

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

15 Terms

1
New cards

Q: In decision trees, what are independent variables called?

Predictors (x).

2
New cards

Q: What determines the split at each node in a decision tree?

The split that best separates the data, typically using Gini impurity for classification.

3
New cards

Q: When do you stop growing a decision tree?

When all nodes are pure (one label) or when nodes contain very few data points.

4
New cards

Q: What is Gini impurity?

A measure of how mixed the labels are within a group; lower Gini impurity is better.

5
New cards

Q: How is Gini impurity calculated for a group?

GI = 1 - p1^2 - p0^2
(where p1 and p0 are the proportions of class 1 and 0, respectively)

6
New cards

Q: In decision trees, do we want Gini impurity to be high or low?

As low as possible.

7
New cards

Q: What is bootstrapping?

Resampling a dataset with replacement to create many new datasets, and analyzing variability across them.

8
New cards

Q: What is bootstrap aggregation (bagging)?

Training a model on each bootstrapped sample and aggregating the results to reduce overfitting.

9
New cards

Q: Why does a fully grown single decision tree overfit?

Because it can perfectly memorize the training data, losing generalization ability.

10
New cards

Q: What is a random forest?

An ensemble of decision trees grown on bootstrapped samples, with random feature selection at splits.

11
New cards

Q: How does random forests reduce overfitting compared to a single decision tree?

By averaging the predictions of many trees, reducing variance.

12
New cards

Q: What extra randomness is added in random forests beyond bootstrapping?

At each split, only a random subset of predictors is considered.

13
New cards

Q: What R packages are used for building decision trees?

rpart, tree, and party.

14
New cards

Q: What R package is commonly used for random forests?

randomForest.

15
New cards

Q: What R package helps with cross-validation and tuning models?

caret