Decision Tree
- Split Search - Examines all possible binary splits of the data along each predictor variable to select the split that most reduces some measure of node entropy.
- What does the circles represent in a decision tree - decision
- What does the rectangles represent in a decision tree - tests
- Circles - terminal (leaf) nodes
- Top or starting node - root node
- Internal nodes - rectangles
- Information Gain(IG)- Is based on the reduction in entropy of a target variable after a dataset is split on an attribute
- Information Gain Formula - IG(Y,X) = E(Y) - E(Y|X)
- What measure is information gain based on - entropy
- Entropy - expected amount of information
- How much information does ‘an attribute‘ give about ‘the class (a target variable)‘? - Attributes that perfectly partition should give maximal information
- What is Entropy the measure of? - Measure of impurity(disorder)
- What is Entropy denoted as? - H(x) or E(x)
- What is the outcome if entropy = 0 - the outcome is ‘certain‘
- What is the outcome if entropy = maximum - the outcome is equally possible
- Entropy Formula -
- Σ P(x)log2P(x)
- Information value - information content of an event E exponentially increases as the probability of occurring an event E ‘p(E)‘ decreases
- Information Value Formula - log2(1/p(E))
- What does the base represent in information value - the unity of information(possibility of selection)
- What is the best way to figure out the best attribute - information gain
- What is used to calculate information gain - entropy
- Modeling essentials - determine type of prediction, select useful inputs and optimize complexity
- Modeling essentials for decision trees - prediction rules, split search and pruning
- Maximal Tree - The most complex model within a sequence of models with increasing complexity.
- Steps to Pruning Two Splits in Maximum Tree- rate each subtree using validation assessment, select the subtree with the best assessment rating
- Steps for Subsequent Pruning - Continue pruning until all subtrees are considered, compare validation assessment among tree complexities
- Validation Assessments - Decision Optimization(Accuracy), Estimate Optimization(Squared Error)
- With binary target which two decision types do you consider - primary decision, secondary decision
- Minimize squared error - squared difference between target and prediction [(target - estimate)^2]
- 1-p - the probability of the event not occuring
- p/1-p - the odds of the event happening
- Properties of odds - is not symmetric, varying from 0 to infinity, when it’s 1 the probability is 50%
- Properties of log odds - is symmetric, going from minus infinity to positive infinity(like a line), when its 0 the probability is 50%, highly negative for low probabilities and highly positive for high probabilities.