Looks like no one added any tags here yet for you.
Regression and Classification Trees:
R
Region of predictor space
Regression and Classification Trees:
n_m
Number of observations in node m
Regression and Classification Trees:
n_(m,c)
Number of category c observations in node m
Regression and Classification Trees:
I
Impurity
Regression and Classification Trees:
E
Classification error rate
Regression and Classification Trees:
G
Gini index
Regression and Classification Trees:
D
Cross entropy
Regression and Classification Trees:
T
Subtree
Regression and Classification Trees:
|T|
Number of terminal nodes in T
Regression and Classification Trees:
๐
Tuning parameter
Decision Tree
Visually shows the partitions within a predictor space.
The decision tree provides an intuitive way to predict the response for ________________.
new observations
Decision Trees:
Left Branch
Statement is true
Decision Trees:
Right Branch
Statement is false
Regression and Classification Trees:
Algorithm
1. Construct a large tree with ๐ terminal nodes using ______________________.
2. Obtain a sequence of best subtrees, as a function of ๐, using _________________________.
3. Choose ๐ by applying ___________________. Select the ๐ that results in the lowest _______________________.
4. The best subtree is the subtree created in step 2 with the selected ____________.
recursive binary splitting
cost complexity pruning
๐-fold cross validation
cross-validation error
๐ value
Recursive Binary Splitting:
Classification
Minimize
1/๐ โ (m=1 to g) { ๐_m โ ๐ผ_m }
Recursive Binary Splitting:
Classification
๐ฬ_(m,c) =
= ๐_(m,c)โ๐_m
Recursive Binary Splitting:
Classification
๐ธ_m =
= 1 โ max_(c) {๐ฬ_(m,c) }
Recursive Binary Splitting:
Classification
๐บ_m =
= โ (c=1 to w) {๐ฬ_(m,c) โข (1 โ ๐ฬ_(m,c))}
Recursive Binary Splitting:
Classification
๐ท_m =
= โโ (c=1 to w) {๐ฬ_(m,c) โข ln(๐ฬ_(m,c))}
Recursive Binary Splitting:
Classification
deviance =
= โ2 โ (m=1 to g) โ (c=1 to w) {๐_(m,c) โข ln(๐ฬ_(m,c))}
Recursive Binary Splitting:
Classification
residual mean deviance =
= deviance / (๐ โ g)
Classification Error Rate
1. _______________ able to capture purity improvement.
2. Focuses on _________________________.
3. Preferred for __________________.
Less
misclassified observations
pruning trees (simpler tree with lower variance and higher prediction accuracy)
Gini Index and Cross Entropy
1. ______________ able to capture purity improvement.
2. Focuses on ______________________.
3. Preferred for ___________________.
More
maximizing node purity
growing trees
Cost Complexity Pruning:
_______________ or _______________ represent the partitions of the predictor space.
Terminal nodes
leaves
Cost Complexity Pruning:
__________________ are points along the tree where splits occur.
Internal nodes
Cost Complexity Pruning:
Terminal nodes do not have __________________, but internal nodes do.
child nodes
Cost Complexity Pruning:
__________________ are lines that connect any two nodes.
Branches
Cost Complexity Pruning:
A decision tree with only one internal node is called a ________________.
stump
Split Point
Midpoint of two unique consecutive values for a given explanatory variable
Recursive Binary Splitting Stopping Criterion
At least 5 observations in each terminal node.
Recursive binary splitting only produces _________________ regions.
rectangular
Recursive binary splitting usually produces ___________ and ____________ trees. The bigger the tree, the more terminal nodes there are, which means the more _____________ it is. This also means a higher chance of _____________ the data.
large
complex
flexible
overfitting
Advantages of Trees:
1. Easy to ___________ and _______________.
2. Can be presented _______________.
3. Manage ______________ variables without the need of _______________ variables.
4. Mimic _________________________.
interpret
explain
visually
categorical
dummy
human decision-making
Disadvantages of Trees:
1. Not _____________.
2. Do not have the same degree of __________________ as other statistical methods.
robust
predictive accuracy
Pruning
Remove internal nodes and all nodes following.
Linear models good for approximately ______________________.
linear relationships
Decision trees good for more _________________________.
complicated relationships
Bagging, Random Forests, and Boosting
Improve the predictive accuracy of trees.
Bootstrapping
Sampling with replacement to create artificial samples from the set of observations.
Bootstrapping
Original Set =
Bootstrap Samples =
Distinct Bootstrap Samples =
= n observations
= n^(n) samples
= (2n-1) choose (n-1) samples
Probability an observation is not selected as a bootstrap sample =
which converges to ________________ as n approaches ________________.
= (1 - 1/n)^n
1/e
infinity
Multiple Trees:
Bagging Steps
1. Create __________________________ from the original training dataset.
2. Construct a ______________ for each bootstrap sample using ______________________.
3. Predict the response of a new observation by ________________________ (regression trees) or by ____________________________ (classification trees) across all ๐ trees.
๐ bootstrap samples
decision tree (called bagged trees)
recursive binary splitting
averaging the predictions
using the most frequent category
Bagged trees are not _______________.
pruned
Bagging:
As b increases, model accuracy _____________, variance ________________ (due to bagging)
increases
decreases
Bagging:
Bagging makes it more _____________ to interpret the bagged model as a whole since we can not visualize all b bagged trees with a _________________. If we have a single tree (without bagging), we use ___________________ to estimate the test error.
difficult
single tree
cross-validation
Bagging Properties:
1. Increasing ๐ does not cause ________________.
2. Bagging reduces _____________________.
3. ___________________ is a valid estimate of test error (with bagging).
overfitting
variance
Out-of-bag error
Calculating the OOB Error for a Bagged Model:
1. For each bagged tree, predict the response for each ________________.
2. Summarize predictions and compute the OOB error as the ____________ for regression trees or the ___________________ for classification trees.
3. Graph the OOB error against the number of ________________.
out-of-bag observation
test MSE
test error rate
bagged trees (want the lowest OOB error)
Random Forests use similar bagged trees:
Increases correlations between predictions and diminishes the variance-reducing power of bagging.
Random Forests Steps:
1. Create _________________________ from the original training dataset.
2. Construct a __________________ for each bootstrap sample using ______________________. At each split, a random subset of ___________________ are considered.
3. Predict the response of a new observation by __________________________ (regression trees) or by _________________________________ (classification trees) across all ๐ trees.
๐ bootstrap samples
decision tree
recursive binary splitting
๐ variables
averaging the predictions
using the most frequent category
Random Forests Properties:
1. __________________ is a special case of random forests.
2. Increasing ๐ does not cause ____________________.
3. Decreasing ๐ reduces the _________________________________.
Bagging (k=p)
overfitting
correlation between predictions
Random Forests k values:
Regression Trees
k =
= p/3
Random Forests k values:
Classification Trees
k =
= sqrt(p)
OOB Error:
Bagging _______ Random Forests (not always though)
>
Performance
Bagging _______ Random Forests
<
Boosting does not involve _______________.
bootstrapping
Boosting grows trees ______________ using information from previous.
sequentially
Boosting Steps:
Let ๐ง_1 be the actual response variable, ๐ฆ.
1. For ๐ = 1, 2, ... , ๐:
โข Use recursive binary splitting to fit a tree with ๐ splits to the data with ๐ง_k as the response.
โข Update ๐ง_k by subtracting ๐ โ
๐^(hat)_(k) (๐ฑ),
i.e. let ๐ง_(k+1) = ๐ง_k โ ๐ โ
๐^(hat)_(k) (๐ฑ).
2. Calculate the boosted model prediction as
๐^(hat) (๐ฑ) = โ (k=1 to b) {๐ โ
๐^(hat)_(k) (๐ฑ) .
Boosting Properties:
1. Increasing ๐ can cause _________________.
2. Boosting reduces _________________.
3. ๐ controls _________________ of the boosted model.
4. ๐ controls the ____________ at which boosting _________________.
overfitting
bias
complexity
rate
learns