Ch 3 decision trees

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/39

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

40 Terms

1
New cards

True or false : pre-processing is a standardized procedure that is independent of the model that will be used afterwards

False

2
New cards

True of False : One-hot encoding a categorical feature with originally 3 separate categories results in 3 new columns

True

3
New cards

DT root (node)

when we have all our data (befire any preprocessing) has no parent node 

4
New cards

DT branches 

connections between nodes (collections of data) 

5
New cards

DT internal node

had parent and child notes

6
New cards

Tree-based segmentation strategy

  • the sample is iterativelu split in smaller samples

  • each time all prossible splits are evaluated

  • the best possible split is selected 

7
New cards

the splitting decision : which two effects of a split are taken into account

  • the reduction of impurity 

  • the number of observations in the resulting subsamples 

8
New cards

the splitting decision : the reduction of impurity

  • do we improve the homogeneity of the resulting subsamples compared to the homogeneity of the initial sample that is split

  • how to evaluate impurity (C4,5) gini 

  • how to compare impurity of child nodes with impurity of parent node ? 

9
New cards

information gain

  • weighted decrease in impurity

  • = impurity parents - sum P child x impurity child 

  • P child = proportion of observations of the parent node in a child node 

10
New cards

what is the goal of a predictive model

generalization

11
New cards

generalization

referes to a model’s ability to accurately predict unseen data

goal == finding patterns that egeneralize

(memorization of the training set data does not make a good model =/= learning a representative pattern 

12
New cards

how do we test if a model is a good model 

with the test set→ measures generalization performance 

13
New cards

Q ; will a neural network always lead to better generalization performances than a linear regression ?

we don’t know, a neural network can for sure connect everyting, but sometimes the linear model generlises better → we need to test

14
New cards

model capacity / complexity

ability to remember information about its training data.

usually not a formal term but corresponds roughly to the number of trainable parameters, splits … 

→ more complex isn’t always better 

15
New cards

overfitting

too high complexity : memorization : i.e. learn the corret answer for every training exaple, but learned pattern doesn’t generalize to novel examples / instances → no generalization

  • too complex models may fit the noise in the dataset

general concept of too high capacity / complexity 

16
New cards

but how to regulate complexity to optimize generalization ?

use holdout data

= creating a “lab test” of generalization performance

17
New cards

how accurate do we expect a model to be

it depends, it is an engineering discipline, you need the environment, the contextualities → the model won’t often work better on the test set than on the training set

the distribution of the trainingset might not be the same as the new data → time predictions 

you assume that it should stay around the same, but if the distribution changes, then there could be a degradation 

18
New cards

examples of modeling decisions

  • choosing the best performing model (eg best tree)

  • choose optimal k in KNN

  • select features

  • setting hyperparameters = parameter whose value is used to control the learning process 

19
New cards

hyperparameter 

= parameter whose value is used to control the learning process

20
New cards

modeling decisions are made on the … data

validation set

  • training data → overfitting 

  • test data →  no longer unbiased estimate of generalization error

21
New cards

golden rule

test set must stay a representative sample for out-of sample evaluation

= lock away the test set until after all modeling decisions have been taken

22
New cards

training data

to train the model → largest subset of total data

23
New cards

validation data

to make modeling decisions

24
New cards

test data

to obtain unbiased estimate of our of sample performance

25
New cards

splitting decision : possible splits that are evalueted

  • continuous variables : all possible values (<) 

  • categorical variables : 

  • → nominal : all possible combinations of values (=)

  • → ordinal : all possible values (<) 

  • missing values : can be added to the child node to maximize information gain (separate group)

26
New cards

why do we have to stop growing the tree befroe we reach minimum impurity ?

impurity and information gain is based on the trainijng set only, so if you add too much complexity, you start overfitting and the rest won’t fit anymore 

27
New cards

stopping decision : possible approaches

information gain above a minimum threshold 

maximum siez of the tree : 

  • number of ‘levels’

  • number of leaves = number of rules = number of segments

minimum number of observations per leaf 

28
New cards

the tree should generalize well ie it should accurately predict new, unseen observations BUT

too large tree resulting in a complex decision boudnary = overfitted tree

29
New cards

to decide when to stop growing the tree

we want to maximize generalizatio performance

using the validation set : random subsample of the training data (typically 30%) 

30
New cards

which data set do we use to decide when to stop in our decision tree

the validation set

31
New cards

evaluation of the stopping decision 

  • classifocation accuracy : % of correctly predicted observations (also called PPC : percentage correctly classified) 

  • misclassification error : % of incorrectly predicted observations 

32
New cards

potential problem with early-stopping

non convex curve

→ solution : pruning

33
New cards

pruning :

grow full tree, with accuracy on the training set = 100%

cut branches to optimize perfromance on validation set 

34
New cards

validation set for deciding on the size of the decision tree : 

= tuning the decision tree size 

remember : 

  • often used in data science : tuning hyperparameters 

  • eg : number of neurons in an artifical neural network, number of trees in an ensemble 

35
New cards

assignment decision 

simple : majority voting 

better : probability to be good / bad

36
New cards

majority voting

class of most of the observations in a leaf

37
New cards

interaction effects

  • when the relation between predictor A and the target variable depends on the value of another predictor B 

  • → then there is an interaction between predictors A and B 

  • meaning ; for different subsamples there are other variables whoch explaon the outcome in terms of the target variable 

  • CHAID : CHi squared Automatic Interaction Detection 

38
New cards

advantages of decision trees

interpretable : rules 

non-parametric 

robust with respect to input data 

  • missing values, 

  • outliers,

  • variable selection 

  • categorical and continous variables

39
New cards

disadvantages of decision trees

sensitive to changes in the training data : weak classifier

  • different sploit in training and validation set uelds possibly a different tree ; derived relations unstable ? 

  • sensitive to impabalnces class distribution 

  • predictive power 

40
New cards

used of decision trees

predictive model ; not recommended

data exporation : no preprocessing required

variable selection in preprocessing step using information gain

segmentation prior development of final predictive moels 

  • segments are found based on interaction, so meaningful segmentation 

  • coarse classification / binning