10 - Decision Trees

Overview of Machine Learning
  • Machine Learning: A subset of artificial intelligence focused on enabling systems to learn from data and improve over time without explicit programming.

Types of Learning
  • Supervised Learning:

    • Uses labeled data.

    • Predicts outcomes based on input-output pairs.

    • Requires prior knowledge of classes.

  • Unsupervised Learning:

    • Uses unlabeled data.

    • Seeks to find hidden patterns or intrinsic structures without predefined categories.

Classification in Machine Learning
  • Classification: A mechanism that predicts a data point's category based on its features. The goal is to assign a label from predefined categories.

  • Classifier: An algorithm that assigns a piece of data to one of multiple predefined classes.

Applications of Classification
  • Spam detection

  • Fraud detection

  • Object recognition

  • Medical diagnostics

  • Image classification

Characteristics of Classification Algorithms
  • Supervised algorithms utilize labeled training datasets to make predictions.

  • Classification is a subset of supervised learning that involves predicting categorical labels.

Classification Algorithms
  • Common algorithms include:

    • Logistic Regression

    • Decision Trees

    • Support Vector Machines

    • k-Nearest Neighbors

Introduction to Decision Trees
  • Decision Trees: A flowchart-like structure used to make decisions based on data features or to predict outcomes.

    • Components:

    • Root Node: The top decision node, representing the best predictor.

    • Internal Nodes: Decision nodes for splitting data.

    • Leaf Nodes: Terminal nodes providing the final classification or decision.

    • Example:

    • Decision to bring an umbrella based on cloudiness.

Building a Decision Tree
  1. Select the best attribute as a splitting criterion.

  2. Split the dataset based on this attribute.

  3. Continue splitting recursively until stopping criteria are met (e.g., all data classified, no remaining features, maximum depth reached).

Types of Decision Trees
  • Binary Trees: Each node has two children.

  • Ternary Trees: Three children per node.

  • N-ary Trees: More than three children per node.

Measures of Impurity
  • Impurity measures the degree of heterogeneity in a dataset. Common measures include:

    • Gini Index: Estimates the probability of misclassification.

    • Entropy: Measures the uncertainty in the data; goal is to minimize it during splits.

Decision Tree Splitting
  • Splits are evaluated based on impurity measures:

    • Gini impurity and entropy guide the selection of the best attribute to split on.

  • Information Gain (IG) indicates how much entropy is reduced by a particular split.

Selecting the Root Node
  • The root node is chosen based on the feature that maximizes Information Gain, thus reducing uncertainty the most.

Characteristics of Decision Trees
  • Computationally inexpensive to build.

  • Fast classification for new records.

  • Intuitive and easy to interpret for non-technical stakeholders.

  • Robust against noise in data, especially with pruning to prevent overfitting.

  • Usable with both numerical and categorical data.

Pros of Decision Trees
  • Easy to Understand: The structure is intuitive, making it easy for non-technical stakeholders to interpret.

  • No Data Preparation Required: They do not require normalization or scaling of data.

  • Handles Both Numerical and Categorical Data: Decision trees can work with different types of data, which makes them versatile.

  • Robust: They can handle missing values and are less affected by outliers.

  • Visual Representation: The flowchart-like structure helps to visualize the decision-making process clearly.

Cons of Decision Trees
  • Overfitting: They can create overly complex trees that do not generalize well to unseen data if not properly pruned.

  • Instability: Small changes in data can result in a completely different structure, making the model sensitive.

  • Bias towards Certain Structures: Decision trees can be biased if one class dominates the dataset, leading to poor predictions for minority classes.

  • Limited Expressiveness: They are less effective for capturing complex relationships compared to other algorithms like ensemble methods or neural networks.

When to Use Decision Trees
  • When Interpretability is Crucial: If stakeholders need to understand the decision-making process clearly.

  • For Preliminary Data Analysis: To gather insights about the data and explore relationships before applying more complex models.

  • If the Data is Mixed-Type: When there are both categorical and numerical features in the dataset.

  • In Cases of Missing Values: They can work well when some data points are incomplete.

  • In Applications Requiring Fast Inference: When predictions need to be made quickly, such as real-time systems.