Decision Trees with SAS Ch. 22

studied byStudied by 0 people
0.0(0)
learn
LearnA personalized and smart learning plan
exam
Practice TestTake a test on your terms and definitions
spaced repetition
Spaced RepetitionScientifically backed study method
heart puzzle
Matching GameHow quick can you match all your cards?
flashcards
FlashcardsStudy terms and definitions

1 / 31

encourage image

There's no tags or description

Looks like no one added any tags here yet for you.

32 Terms

1

Start with:

Data setup

  • trees can handle missing values

  • note: trees dont have assumptions

<p>Data setup </p><ul><li><p>trees can handle missing values </p></li><li><p>note: trees dont have assumptions </p></li></ul><p></p>
New cards
2

With decision tree building we start with:

Proc hpsplit

New cards
3

Proc hpsplit

  • Seed to make sure it is not random

  • Class- specify classification variables

    – Include DV this time!
    • Because DT can handle interval DVs

  • Model statement like before (order matters but doesn’t)

    – It will build differently based on order inputted, but we are looking for decisions not answers!!!

  • Grow- default method is entropy

  • Prune- default method is costcomplexity

    – Balance of error rate and simplicity

  • Rules file- details of the tree

<ul><li><p><span>Seed to make sure it is not random</span></p></li><li><p><span>Class- specify classification variables</span></p><p><span>– Include DV this time!<br>• Because DT can handle interval DVs</span></p></li><li><p><span>Model statement like before (order matters but doesn’t)</span></p><p><span>– It will build differently based on order inputted, but we are looking for decisions not answers!!!</span></p></li><li><p><span>Grow- default method is entropy</span></p></li><li><p><span>Prune- default method is costcomplexity</span></p><p><span>– Balance of error rate and simplicity</span></p></li><li><p><span>Rules file- details of the tree</span></p></li></ul><p></p>
New cards
4

Looking at the tree….

  • NodeistheNodeID number

  • Nrepresentsthenumber of observations in that node

  • 2representsthe classification and the % of the variable most common in that node

  • Belowthelineisthe breakdown of all classifications

<ul><li><p><span>NodeistheNodeID number</span></p></li><li><p><span>Nrepresentsthenumber of observations in that node</span></p></li><li><p><span>2representsthe classification and the % of the variable most common in that node</span></p></li><li><p><span>Belowthelineisthe breakdown of all classifications</span></p></li></ul><p></p>
New cards
5

Fit

  • Confusion Matrix to calculate fit

  • ROC Curve, just like before

  • Variable importance shows the importance of each variable in the tree

<ul><li><p><span>Confusion Matrix to calculate fit</span></p></li><li><p><span>ROC Curve, just like before</span></p></li><li><p><span>Variable importance shows the importance of each variable in the tree</span></p></li></ul><p></p>
New cards
6

Rules of the leaves

  • (Only shows the leaf nodes)

  • If the last gift amount is missing or less than $18, they are predicted to give

  • If the last gift amount is $18 or more, they are predicted to not give this time.

<ul><li><p><span>(Only shows the leaf nodes)</span></p></li><li><p><span>If the last gift amount is missing or less than $18, they are predicted to give</span></p></li><li><p><span>If the last gift amount is $18 or more, they are predicted to not give this time.</span></p></li></ul><p></p>
New cards
7

Subtrees (Pruning)

• Selecting an earlier iteration of a tree to avoid overfitting

• Removing branches that have few observations

– Leaf size, Maximum Depth, Method properties

• Test other trees
– Select a different maximum depth
– Change the maximum number of branches

New cards
8

More options → Which are endless!!!!

  • assignmissing=

  • Maxbranch=

  • Maxdepth=

  • Grow

  • Prune

  • Partition

  • Score

New cards
9

Assignmissing=

-Branch- create a separate branch for missing

– None- remove from analysis (default)
– Popular- assign to the largest child
– Similar- statistically determine the most similar

New cards
10

Maxbranch=

  • Maximum number of leaves per node

New cards
11

Maxdepth=

  • Maximum levels of the tree

New cards
12

grow

  • Chaid- uses a chi square estimate

New cards
13

Prune

  • Rep- reduced error pruning

New cards
14

Partition

  • – Build the data for you!

New cards
15

Score

  • Creates a score data set just like regression

New cards
16

More options include examples like:

knowt flashcard image
New cards
17

How do you determine the best tree?

through using the Subtree assessment plot → where we look for divergence

<p>through using the Subtree assessment plot → where we look for divergence </p>
New cards
18

What does the score data set do?

Allows us to evaluate the model

New cards
19

What are the consequences of a decision tree?

Look to the confusion matrix! → Look for similarity across partitions for fit

<p>Look to the confusion matrix! → <span>Look for similarity across partitions for fit</span></p>
New cards
20

we can look deeper into the

Confusion matrix and calculate:

  • misclassification rate

  • accuracy

  • Precision

  • specificity

  • sensitivity/ recall

  • harmonic mean

<p>Confusion matrix and calculate: </p><ul><li><p>misclassification rate</p></li><li><p>accuracy </p></li><li><p>Precision</p></li><li><p>specificity</p></li><li><p>sensitivity/ recall</p></li><li><p>harmonic mean </p></li></ul><p></p>
New cards
21

We also can look at the ROC chart to

assess the overall fit of the model

<p>assess the overall fit of the model </p>
New cards
22

Potential improvements: Business understanding

Reevaluate the business question, evaluate the appropriateness of the DV

New cards
23

Potential improvements: Data understanding

  • Consider missingness, data inclusion criteria

New cards
24

Potential Improvements: Data Prep

Consider outliers, transformations, record selection, modeling assumptions

New cards
25

Potential Improvements:

  • Evaluate feature selection, significance, parsimony

New cards
26

Potential Improvements: Evaluation

Evaluate significance, generalizability (data partitioning), explainability, if it is actionable

New cards
27

Potential Improvements: Deployment

Make the model more actionable

New cards
28

How can we build a better tree?

change the data prep and change the model

New cards
29

How do we change the data prep?

Missing & Outliers

New cards
30

How do we change the model?

Feature Inclusion

– Branches
– Depth
– Grow method
– Pruning method

New cards
31

What should we consider with data mining?

  • Remember,thegoalofdataminingistofind unexpected patterns

  • Ifthetreedoesnottellyousomethingyoudon’t know, it is not very insightful

    – Can be helpful if trying to prove your argument or gain confidence

  • Ifthetreeisoverfit,itmightnotbethathelpful

    – Not generalizable
    – Think about baseball pitches!

New cards
32

What about continuous dependent variables?

  • DTs are easier to interpret when used for classification

  • BUT they can be used with a continuous DV

  • Creates a predicted value instead of classification

<ul><li><p><span>DTs are easier to interpret when used for classification</span></p></li><li><p><span>BUT they can be used with a continuous DV</span></p></li><li><p><span>Creates a predicted value instead of classification</span></p></li></ul><p></p>
New cards
robot