1/13
Tree Based Methods
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Tree Based Models
Non-Linear
Classification and Regression Trees (CART) (Decision Trees)
Goal of the decision tree is to split the data into like chunks or pure nodes
Then use the tree structure to make inferences on data
Makes few assumptions about the input dataset
○ No linearity assumptions!
Graph Theory
Gini of Split
Used to split decision tree chunks
Want the split with the lowest possible Gini Impurity
If multiple are equal randomly choose one
Get for all predictors
How do we know to stop splitting?
Tree Depth - how many times do you want to split the data?
Minimum samples in leaf - how small do you want the leaf nodes?
Feature Importance
Tells us how much each input variable (feature) contributes to the prediction of a model
It helps us understand which features matter most in determining the output
Decision Tree Split Types
Classification - Gini, Entropy, Log Lost
Regression - MSE, Absolute Error, Poisson
Cost Complexity Pruning (Weakest Link Pruning)
A very large tree may overfit the data, want to prune the tree by removing some of the unnecessary branches
Start at full deep tree with many terminal nodes (𝛼=0), as 𝛼 increases the cost of having so many terminal nodes increases, and branches get pruned
Cross Validation to find optimal 𝛼, but 𝛼 and T are interacting so often people use them interchangeably
Improvement on Decision Trees
Decision Trees are weak learners with high variance
Eachs split will be very different from each other
We can use Decision Trees as the base for more complex models
Bootstrapping
Sub select the samples for the root node at random
Sampling without replacement, can be selected only once
Out of Bag (OOB) Error Estimation
Average the error across trees
Is a valid estimate of the test data since none of the individual trees have seen the given test sample
Random Forest (RF)
Sub-select the samples for the root node at random (using bootstrapping)
Sub-select the features at random at each split (sampling without replacement, can be selected only once)
Trees within the forest are not pruned
Boosted Trees
Builds trees sequentially
For each data point, calculate the difference between the predicted value and the actual value.
These residuals represent what the first tree did not predict correctly.
This way, it focuses on the mistakes of the previous tree.
Fits each tree hard and may overfit, Boosted Trees are a slow learner
Fit each tree to the residuals of the previous tree instead of Ytrain
Iterative Random Forest (iRF)
The model is retrained multiple times, giving more weight to features that were consistently important in previous iterations.
This helps stabilize the identification of truly important features and reduce noise.