Lecture 8: Decision Trees and Random Forest 1

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/14

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

15 Terms

New cards

Q: In decision trees, what are independent variables called?

Predictors (x).

New cards

Q: What determines the split at each node in a decision tree?

The split that best separates the data, typically using Gini impurity for classification.

New cards

Q: When do you stop growing a decision tree?

When all nodes are pure (one label) or when nodes contain very few data points.

New cards

Q: What is Gini impurity?

A measure of how mixed the labels are within a group; lower Gini impurity is better.

New cards

Q: How is Gini impurity calculated for a group?

GI = 1 - p1^2 - p0^2
(where p1 and p0 are the proportions of class 1 and 0, respectively)

New cards

Q: In decision trees, do we want Gini impurity to be high or low?

As low as possible.

New cards

Q: What is bootstrapping?

Resampling a dataset with replacement to create many new datasets, and analyzing variability across them.

New cards

Q: What is bootstrap aggregation (bagging)?

Training a model on each bootstrapped sample and aggregating the results to reduce overfitting.

New cards

Q: Why does a fully grown single decision tree overfit?

Because it can perfectly memorize the training data, losing generalization ability.

New cards

Q: What is a random forest?

An ensemble of decision trees grown on bootstrapped samples, with random feature selection at splits.

New cards

Q: How does random forests reduce overfitting compared to a single decision tree?

By averaging the predictions of many trees, reducing variance.

New cards

Q: What extra randomness is added in random forests beyond bootstrapping?

At each split, only a random subset of predictors is considered.

New cards

Q: What R packages are used for building decision trees?

rpart, tree, and party.

New cards

Q: What R package is commonly used for random forests?

randomForest.

New cards

Q: What R package helps with cross-validation and tuning models?

caret