Lecture 8: Decision Trees and Random Forest 1

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/14

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

15 Terms

1
New cards

Q: In decision trees, what are independent variables called?

Predictors (x).

2
New cards

Q: What determines the split at each node in a decision tree?

The split that best separates the data, typically using Gini impurity for classification.

3
New cards

Q: When do you stop growing a decision tree?

When all nodes are pure (one label) or when nodes contain very few data points.

4
New cards

Q: What is Gini impurity?

A measure of how mixed the labels are within a group; lower Gini impurity is better.

5
New cards

Q: How is Gini impurity calculated for a group?

GI = 1 - p1^2 - p0^2
(where p1 and p0 are the proportions of class 1 and 0, respectively)

6
New cards

Q: In decision trees, do we want Gini impurity to be high or low?

As low as possible.

7
New cards

Q: What is bootstrapping?

Resampling a dataset with replacement to create many new datasets, and analyzing variability across them.

8
New cards

Q: What is bootstrap aggregation (bagging)?

Training a model on each bootstrapped sample and aggregating the results to reduce overfitting.

9
New cards

Q: Why does a fully grown single decision tree overfit?

Because it can perfectly memorize the training data, losing generalization ability.

10
New cards

Q: What is a random forest?

An ensemble of decision trees grown on bootstrapped samples, with random feature selection at splits.

11
New cards

Q: How does random forests reduce overfitting compared to a single decision tree?

By averaging the predictions of many trees, reducing variance.

12
New cards

Q: What extra randomness is added in random forests beyond bootstrapping?

At each split, only a random subset of predictors is considered.

13
New cards

Q: What R packages are used for building decision trees?

rpart, tree, and party.

14
New cards

Q: What R package is commonly used for random forests?

randomForest.

15
New cards

Q: What R package helps with cross-validation and tuning models?

caret