SRM Exam Chapter 5: Decision Trees

Studied by 0 people

0.0(0)

LearnA personalized and smart learning plan

Practice TestTake a test on your terms and definitions

Spaced RepetitionScientifically backed study method

Matching GameHow quick can you match all your cards?

FlashcardsStudy terms and definitions

Get a hint

Hint

Regression and Classification Trees:
R

Get a hint

Hint

Region of predictor space

Get a hint

Hint

Regression and Classification Trees:
n_m

Get a hint

Hint

Number of observations in node m

1 / 58

Anonymous user

There's no tags or description

Looks like no one added any tags here yet for you.

59 Terms

Regression and Classification Trees:
R

Region of predictor space

New cards

Regression and Classification Trees:
n_m

Number of observations in node m

New cards

Regression and Classification Trees:
n_(m,c)

Number of category c observations in node m

New cards

Regression and Classification Trees:
I

Impurity

New cards

Regression and Classification Trees:
E

Classification error rate

New cards

Regression and Classification Trees:
G

Gini index

New cards

Regression and Classification Trees:
D

Cross entropy

New cards

Regression and Classification Trees:
T

Subtree

New cards

Regression and Classification Trees:
|T|

Number of terminal nodes in T

New cards

Regression and Classification Trees:
𝜆

Tuning parameter

New cards

Decision Tree

Visually shows the partitions within a predictor space.

New cards

The decision tree provides an intuitive way to predict the response for ________________.

new observations

New cards

Decision Trees:
Left Branch

Statement is true

New cards

Decision Trees:
Right Branch

Statement is false

New cards

Regression and Classification Trees:
Algorithm
1. Construct a large tree with 𝑔 terminal nodes using ______________________.
2. Obtain a sequence of best subtrees, as a function of 𝜆, using _________________________.
3. Choose 𝜆 by applying ___________________. Select the 𝜆 that results in the lowest _______________________.
4. The best subtree is the subtree created in step 2 with the selected ____________.

recursive binary splitting
cost complexity pruning
𝑘-fold cross validation
cross-validation error
𝜆 value

New cards

Recursive Binary Splitting:
Classification
Minimize

1/𝑛 ∑ (m=1 to g) { 𝑛_m ⋅ 𝐼_m }

New cards

Recursive Binary Splitting:
Classification
𝑝̂_(m,c) =

= 𝑛_(m,c)⁄𝑛_m

New cards

Recursive Binary Splitting:
Classification
𝐸_m =

= 1 − max_(c) {𝑝̂_(m,c) }

New cards

Recursive Binary Splitting:
Classification
𝐺_m =

= ∑ (c=1 to w) {𝑝̂_(m,c) • (1 − 𝑝̂_(m,c))}

New cards

Recursive Binary Splitting:
Classification
𝐷_m =

= −∑ (c=1 to w) {𝑝̂_(m,c) • ln(𝑝̂_(m,c))}

New cards

Recursive Binary Splitting:
Classification
deviance =

= −2 ∑ (m=1 to g) ∑ (c=1 to w) {𝑛_(m,c) • ln(𝑝̂_(m,c))}

New cards

Recursive Binary Splitting:
Classification
residual mean deviance =

= deviance / (𝑛 − g)

New cards

Classification Error Rate
1. _______________ able to capture purity improvement.
2. Focuses on _________________________.
3. Preferred for __________________.

Less
misclassified observations
pruning trees (simpler tree with lower variance and higher prediction accuracy)

New cards

Gini Index and Cross Entropy
1. ______________ able to capture purity improvement.
2. Focuses on ______________________.
3. Preferred for ___________________.

More
maximizing node purity
growing trees

New cards

Cost Complexity Pruning:
_______________ or _______________ represent the partitions of the predictor space.

Terminal nodes
leaves

New cards

Cost Complexity Pruning:
__________________ are points along the tree where splits occur.

Internal nodes

New cards

Cost Complexity Pruning:
Terminal nodes do not have __________________, but internal nodes do.

child nodes

New cards

Cost Complexity Pruning:
__________________ are lines that connect any two nodes.

Branches

New cards

Cost Complexity Pruning:
A decision tree with only one internal node is called a ________________.

stump

New cards

Split Point

Midpoint of two unique consecutive values for a given explanatory variable

New cards

Recursive Binary Splitting Stopping Criterion

At least 5 observations in each terminal node.

New cards

Recursive binary splitting only produces _________________ regions.

rectangular

New cards

Recursive binary splitting usually produces ___________ and ____________ trees. The bigger the tree, the more terminal nodes there are, which means the more _____________ it is. This also means a higher chance of _____________ the data.

large
complex
flexible
overfitting

New cards

Advantages of Trees:
1. Easy to ___________ and _______________.
2. Can be presented _______________.
3. Manage ______________ variables without the need of _______________ variables.
4. Mimic _________________________.

interpret
explain
visually
categorical
dummy
human decision-making

New cards

Disadvantages of Trees:
1. Not _____________.
2. Do not have the same degree of __________________ as other statistical methods.

robust
predictive accuracy

New cards

Pruning

Remove internal nodes and all nodes following.

New cards

Linear models good for approximately ______________________.

linear relationships

New cards

Decision trees good for more _________________________.

complicated relationships

New cards

Bagging, Random Forests, and Boosting

Improve the predictive accuracy of trees.

New cards

Bootstrapping

Sampling with replacement to create artificial samples from the set of observations.

New cards

Bootstrapping
Original Set =
Bootstrap Samples =
Distinct Bootstrap Samples =

= n observations
= n^(n) samples
= (2n-1) choose (n-1) samples

New cards

Probability an observation is not selected as a bootstrap sample =
which converges to ________________ as n approaches ________________.

= (1 - 1/n)^n
1/e
infinity

New cards

Multiple Trees:
Bagging Steps
1. Create __________________________ from the original training dataset.
2. Construct a ______________ for each bootstrap sample using ______________________.
3. Predict the response of a new observation by ________________________ (regression trees) or by ____________________________ (classification trees) across all 𝑏 trees.

𝑏 bootstrap samples
decision tree (called bagged trees)
recursive binary splitting
averaging the predictions
using the most frequent category

New cards

Bagged trees are not _______________.

pruned

New cards

Bagging:
As b increases, model accuracy _____________, variance ________________ (due to bagging)

increases
decreases

New cards

Bagging:
Bagging makes it more _____________ to interpret the bagged model as a whole since we can not visualize all b bagged trees with a _________________. If we have a single tree (without bagging), we use ___________________ to estimate the test error.

difficult
single tree
cross-validation

New cards

Bagging Properties:
1. Increasing 𝑏 does not cause ________________.
2. Bagging reduces _____________________.
3. ___________________ is a valid estimate of test error (with bagging).

overfitting
variance
Out-of-bag error

New cards

Calculating the OOB Error for a Bagged Model:
1. For each bagged tree, predict the response for each ________________.
2. Summarize predictions and compute the OOB error as the ____________ for regression trees or the ___________________ for classification trees.
3. Graph the OOB error against the number of ________________.

out-of-bag observation
test MSE
test error rate
bagged trees (want the lowest OOB error)

New cards

Random Forests use similar bagged trees:

Increases correlations between predictions and diminishes the variance-reducing power of bagging.

New cards

Random Forests Steps:
1. Create _________________________ from the original training dataset.
2. Construct a __________________ for each bootstrap sample using ______________________. At each split, a random subset of ___________________ are considered.
3. Predict the response of a new observation by __________________________ (regression trees) or by _________________________________ (classification trees) across all 𝑏 trees.

𝑏 bootstrap samples
decision tree
recursive binary splitting
𝑘 variables
averaging the predictions
using the most frequent category

New cards

Random Forests Properties:
1. __________________ is a special case of random forests.
2. Increasing 𝑏 does not cause ____________________.
3. Decreasing 𝑘 reduces the _________________________________.

Bagging (k=p)
overfitting
correlation between predictions

New cards

Random Forests k values:
Regression Trees
k =

= p/3

New cards

Random Forests k values:
Classification Trees
k =

= sqrt(p)

New cards

OOB Error:
Bagging _______ Random Forests (not always though)

New cards

Performance
Bagging _______ Random Forests

New cards

Boosting does not involve _______________.

bootstrapping

New cards

Boosting grows trees ______________ using information from previous.

sequentially

New cards

Boosting Steps:

Let 𝑧_1 be the actual response variable, 𝑦.
1. For 𝑘 = 1, 2, ... , 𝑏:
• Use recursive binary splitting to fit a tree with 𝑑 splits to the data with 𝑧_k as the response.
• Update 𝑧_k by subtracting 𝜆 ⋅ 𝑓^(hat)_(k) (𝐱),
i.e. let 𝑧_(k+1) = 𝑧_k − 𝜆 ⋅ 𝑓^(hat)_(k) (𝐱).
2. Calculate the boosted model prediction as
𝑓^(hat) (𝐱) = ∑ (k=1 to b) {𝜆 ⋅ 𝑓^(hat)_(k) (𝐱) .

New cards

Boosting Properties:
1. Increasing 𝑏 can cause _________________.
2. Boosting reduces _________________.
3. 𝑑 controls _________________ of the boosted model.
4. 𝜆 controls the ____________ at which boosting _________________.

overfitting
bias
complexity
rate
learns

New cards