Split Search - Examines all possible binary splits of the data along each predictor variable to select the split that most reduces some measure of node entropy.
What does the circles represent in a decision tree - decision
What does the rectangles represent in a decision tree - tests
Circles - terminal (leaf) nodes
Top or starting node - root node
Internal nodes - rectangles
Information Gain(IG)- Is based on the reduction in entropy of a target variable after a dataset is split on an attribute
Information Gain Formula - IG(Y,X) = E(Y) - E(Y|X)
What measure is information gain based on - entropy
Entropy - expected amount of information
How much information does ‘an attribute‘ give about ‘the class (a target variable)‘? - Attributes that perfectly partition should give maximal information
What is Entropy the measure of? - Measure of impurity(disorder)
What is Entropy denoted as? - H(x) or E(x)
What is the outcome if entropy = 0 - the outcome is ‘certain‘
What is the outcome if entropy = maximum - the outcome is equally possible
Entropy Formula -
- Σ P(x)log2P(x)
\
Information value - information content of an event E exponentially increases as the probability of occurring an event E ‘p(E)‘ decreases
Information Value Formula - log2(1/p(E))
What does the base represent in information value - the unity of information(possibility of selection)
What is the best way to figure out the best attribute - information gain
What is used to calculate information gain - entropy
Modeling essentials - determine type of prediction, select useful inputs and optimize complexity
Modeling essentials for decision trees - prediction rules, split search and pruning
Maximal Tree - The most complex model within a sequence of models with increasing complexity.
Steps to Pruning Two Splits in Maximum Tree- rate each subtree using validation assessment, select the subtree with the best assessment rating
Steps for Subsequent Pruning - Continue pruning until all subtrees are considered, compare validation assessment among tree complexities
With binary target which two decision types do you consider - primary decision, secondary decision
Minimize squared error - squared difference between target and prediction [(target - estimate)^2]
1-p - the probability of the event not occuring
p/1-p - the odds of the event happening
Properties of odds - is not symmetric, varying from 0 to infinity, when it’s 1 the probability is 50%
Properties of log odds - is symmetric, going from minus infinity to positive infinity(like a line), when its 0 the probability is 50%, highly negative for low probabilities and highly positive for high probabilities.