Ch 3 decision trees

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/44

There's no tags or description

Looks like no tags are added yet.

Last updated 10:35 AM on 12/25/25

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

45 Terms

New cards

True or false : pre-processing is a standardized procedure that is independent of the model that will be used afterwards

False. Pre-processing must be tailored to the specific model.

Example: Distance-based models (like KNN) require scaling/normalization, whereas Tree-based models (like Random Forest) generally do not.

New cards

Q1: what is the importance of pre-processing ?

It ensures data quality and model compatibility. Raw data is often "noisy," incomplete, or inconsistent. Pre-processing handles missing values, scales features so no single variable dominates, and converts text/categories into numbers the model can understand.

“garbage in, garbage out’

New cards

True of False : One-hot encoding a categorical feature with originally 3 separate categories results in 3 new columns

True

Each unique category is transformed into its own binary column (1 or 0)

New cards

When one-hot encoding, what happens to the original categorical feature and why?

The original column is dropped (deleted).

Why: It is redundant because its information is now fully represented by the new binary columns. Keeping it would cause mathematical issues (like multicollinearity) for many algorithms.

New cards

In the provided case study, a company builds a model using data from a test campaign directed by experts. Why does the model perform well in "the lab" (evaluation) but fail when deployed to the full customer base "in the wild"?

The failure is due to Selection Bias (specifically a non-representative training set).
- The Issue: The model was trained only on a specific subset of people—those the expert already thought would respond—rather than a random sample of the general population.
- The Result: The model learned the patterns of "likely responders" chosen by an expert, but it doesn't know how to handle or predict the behavior of the broader, more diverse "wild" customer base.
- Pre-processing Lesson: To build a robust model, you must ensure the training data is representative of the environment where the model will actually be used.

New cards

DT root (node)

when we have all our data (befire any preprocessing) has no parent node

New cards

DT branches

connections between nodes (collections of data)

New cards

DT internal node

had parent and child notes

New cards

Tree-based segmentation strategy

the sample is iterativelu split in smaller samples
each time all prossible splits are evaluated
the best possible split is selected

New cards

the splitting decision : which two effects of a split are taken into account

the reduction of impurity
the number of observations in the resulting subsamples

New cards

the splitting decision : the reduction of impurity

do we improve the homogeneity of the resulting subsamples compared to the homogeneity of the initial sample that is split
how to evaluate impurity (C4,5) gini
how to compare impurity of child nodes with impurity of parent node ?

New cards

information gain

weighted decrease in impurity
= impurity parents - sum P child x impurity child
P child = proportion of observations of the parent node in a child node

New cards

what is the goal of a predictive model

generalization

New cards

generalization

referes to a model’s ability to accurately predict unseen data

goal == finding patterns that egeneralize

(memorization of the training set data does not make a good model =/= learning a representative pattern

New cards

how do we test if a model is a good model

with the test set→ measures generalization performance

New cards

Q ; will a neural network always lead to better generalization performances than a linear regression ?

we don’t know, a neural network can for sure connect everyting, but sometimes the linear model generlises better → we need to test

New cards

DT : non-parametric (what does that mean)

statistical methods that make few or no assumptions about the underlying population's distribution

New cards

What are the three most common "early stopping" parameters used to limit a decision tree’s growth?

1. Max Depth: Limits how deep the tree can grow. 2. Min Samples Split: The minimum number of data points required to split an internal node. 3. Min Samples Leaf: The minimum number of data points required to be in a final leaf node.

Effect: Increasing these (except Max Depth) simplifies the model and reduces overfitting.

New cards

model capacity / complexity

ability to remember information about its training data.

usually not a formal term but corresponds roughly to the number of trainable parameters, splits …

→ more complex isn’t always better

New cards

overfitting

too high complexity : memorization : i.e. learn the corret answer for every training exaple, but learned pattern doesn’t generalize to novel examples / instances → no generalization

too complex models may fit the noise in the dataset

general concept of too high capacity / complexity

New cards

but how to regulate complexity to optimize generalization ?

use holdout data

= creating a “lab test” of generalization performance

New cards

how accurate do we expect a model to be

it depends, it is an engineering discipline, you need the environment, the contextualities → the model won’t often work better on the test set than on the training set

the distribution of the trainingset might not be the same as the new data → time predictions

you assume that it should stay around the same, but if the distribution changes, then there could be a degradation

New cards

examples of modeling decisions

choosing the best performing model (eg best tree)
choose optimal k in KNN
select features
setting hyperparameters = parameter whose value is used to control the learning process

New cards

hyperparameter

= parameter whose value is used to control the learning process

New cards

modeling decisions are made on the … data

validation set

training data → overfitting
test data → no longer unbiased estimate of generalization error

New cards

golden rule

test set must stay a representative sample for out-of sample evaluation

= lock away the test set until after all modeling decisions have been taken

New cards

training data

to train the model → largest subset of total data

New cards

validation data

to make modeling decisions

New cards

test data

to obtain unbiased estimate of our of sample performance

New cards

splitting decision : possible splits that are evalueted

continuous variables : all possible values (<)
categorical variables :
→ nominal : all possible combinations of values (=)
→ ordinal : all possible values (<)
missing values : can be added to the child node to maximize information gain (separate group)

New cards

why do we have to stop growing the tree befroe we reach minimum impurity ?

impurity and information gain is based on the trainijng set only, so if you add too much complexity, you start overfitting and the rest won’t fit anymore

New cards

stopping decision : possible approaches

information gain above a minimum threshold

maximum siez of the tree :

number of ‘levels’
number of leaves = number of rules = number of segments

minimum number of observations per leaf

New cards

the tree should generalize well ie it should accurately predict new, unseen observations BUT

too large tree resulting in a complex decision boudnary = overfitted tree

New cards

to decide when to stop growing the tree

we want to maximize generalizatio performance

using the validation set : random subsample of the training data (typically 30%)

New cards

which data set do we use to decide when to stop in our decision tree

the validation set

New cards

evaluation of the stopping decision

classifocation accuracy : % of correctly predicted observations (also called PPC : percentage correctly classified)
misclassification error : % of incorrectly predicted observations

New cards

potential problem with early-stopping

non convex curve

→ solution : pruning

New cards

pruning :

grow full tree, with accuracy on the training set = 100%

cut branches to optimize perfromance on validation set

New cards

validation set for deciding on the size of the decision tree :

= tuning the decision tree size

remember :

often used in data science : tuning hyperparameters
eg : number of neurons in an artifical neural network, number of trees in an ensemble

New cards

assignment decision

simple : majority voting

better : probability to be good / bad

New cards

majority voting

class of most of the observations in a leaf

New cards

interaction effects

when the relation between predictor A and the target variable depends on the value of another predictor B
→ then there is an interaction between predictors A and B
meaning ; for different subsamples there are other variables whoch explaon the outcome in terms of the target variable
CHAID : CHi squared Automatic Interaction Detection

New cards

advantages of decision trees

interpretable : rules

non-parametric

robust with respect to input data

missing values,
outliers,
variable selection
categorical and continous variables

New cards

disadvantages of decision trees

sensitive to changes in the training data : weak classifier

different sploit in training and validation set uelds possibly a different tree ; derived relations unstable ?
sensitive to impabalnces class distribution
predictive power

New cards

used of decision trees

predictive model ; not recommended

data exporation : no preprocessing required

variable selection in preprocessing step using information gain

segmentation prior development of final predictive moels