1/14
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Q: In decision trees, what are independent variables called?
Predictors (x).
Q: What determines the split at each node in a decision tree?
The split that best separates the data, typically using Gini impurity for classification.
Q: When do you stop growing a decision tree?
When all nodes are pure (one label) or when nodes contain very few data points.
Q: What is Gini impurity?
A measure of how mixed the labels are within a group; lower Gini impurity is better.
Q: How is Gini impurity calculated for a group?
GI = 1 - p1^2 - p0^2
(where p1 and p0 are the proportions of class 1 and 0, respectively)
Q: In decision trees, do we want Gini impurity to be high or low?
As low as possible.
Q: What is bootstrapping?
Resampling a dataset with replacement to create many new datasets, and analyzing variability across them.
Q: What is bootstrap aggregation (bagging)?
Training a model on each bootstrapped sample and aggregating the results to reduce overfitting.
Q: Why does a fully grown single decision tree overfit?
Because it can perfectly memorize the training data, losing generalization ability.
Q: What is a random forest?
An ensemble of decision trees grown on bootstrapped samples, with random feature selection at splits.
Q: How does random forests reduce overfitting compared to a single decision tree?
By averaging the predictions of many trees, reducing variance.
Q: What extra randomness is added in random forests beyond bootstrapping?
At each split, only a random subset of predictors is considered.
Q: What R packages are used for building decision trees?
rpart, tree, and party.
Q: What R package is commonly used for random forests?
randomForest.
Q: What R package helps with cross-validation and tuning models?
caret