03 Classification Trees
Classification Trees
Method: Majority Vote.
Growing a Classification Tree
Recursive Binary Splitting
Replace RSS with the Classification Error Rate for making splits:
Classification Error = Number of misclassifications in a region.
Classification Error Rate = Percentage of misclassified objects in that region (1 - percentage of majority class).
Measures for Assessing Node Quality
Node Purity Metrics
Important to evaluate purity to improve prediction confidence:
Classification Error Rate: Focuses on misclassification rate.
Gini Index: Measures how often a randomly chosen element from the set would be incorrectly labeled.
Entropy: Measures disorder or impurity within the node.
Characteristics of Node Purity
Smaller values in Gini Index and Entropy indicate higher node purity (clearer predictions).
Best purity occurs when all objects in a node belong to the same category (100% purity).
Worst purity happens when there is an equal distribution of classes (e.g., 50% apples and 50% pears).
Growing and Pruning the Tree
Growing Steps
Use Gini Index or Entropy during the recursive splitting to encourage splits that improve prediction confidence.
Pruning Steps
Use Classification Error Rate during pruning to eliminate branches that do not improve accuracy.
Handling Qualitative Input Variables
Categorical variables can also be split using binary criteria:
Assign values based on categories rather than numerical thresholds.
Specify two subsets for splits.