W3: Classification: Decision Tree and Performance Evaluation

0.0(0)
studied byStudied by 1 person
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/29

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

30 Terms

1
New cards

Classification

supervised learning technique in machine learning and statistics where the goal is to assign data points (objects) to predefined categories (classes) based on their features (attributes).

  • Objective: Predict the class label of an object based on its features.

  • Input: A dataset with labeled examples (features and corresponding class labels).

  • Output: A model that can predict the class label for new, unseen data.

<p>supervised learning technique in machine learning and statistics where the goal is to assign data points (objects) to predefined categories (classes) based on their features (attributes).</p><ul><li><p><strong>Objective: </strong>Predict the class label of an object based on its features.</p></li><li><p><strong>Input: </strong>A dataset with labeled examples (features and corresponding class labels).</p></li><li><p><strong>Output: </strong>A model that can predict the class label for new, unseen data.</p></li></ul><p></p>
2
New cards

Why classification?

Predicting the class or category of an action to enable timely and cost-efficient decision-making.

  • used in business, healthcare, finance, and other fields to streamline processes, reduce costs, and improve decision-making.

3
New cards

Types of Classification

Yes/no categorization: for two possible outcomes.

  • Example: Cat vs. Dog Classification

Multi-class classification involves categorizing images into three or more classes. This is more complex than binary classification as the model must learn to distinguish between multiple categories simultaneously.

  • Example: sentiment analysis (positive, neutral, negative)

<p>Yes/no categorization: for two possible outcomes.</p><ul><li><p>Example: Cat vs. Dog Classification</p></li></ul><p>—</p><p><strong>Multi-class classification</strong> involves categorizing images into three or more classes. This is more complex than binary classification as the model must learn to distinguish between multiple categories simultaneously.</p><ul><li><p>Example: sentiment analysis (positive, neutral, negative)</p></li></ul><p></p>
4
New cards

Decision Tree

powerful and intuitive machine learning technique that learn and express classification or prediction patterns in the form of a tree structure.

<p>powerful and intuitive machine learning technique that learn and express classification or prediction patterns in the form of a tree structure.</p>
5
New cards

Advantages of Decision Tree Algorithms

Easy to understand and visualize, making them ideal for explaining model decisions to non-experts.

  • Less number of data preparation steps unlike other machine learning algorithms

  • It is a non-parametric algorithm i.e. it does not require lot of assumptions

6
New cards

Disadvantage of Decision Tree Algorithms

prone to overfitting, creating complex decision rules that capture noise in training data, leading to poor generalization on unseen samples.

  • Training decision trees can be highly time-consuming, especially when dealing with large datasets or multiple continuous independent variables

    • finding optimal split points requires evaluating numerous thresholds.

  • Decision trees are inherently unstable, with small changes in the input data potentially causing dramatic shifts in the tree structure predictions

    • making them sensitive to variations in the training set.

7
New cards

Decision Tree Model Steps

Step 1) Kicked Vehicle Data

Step 2) Data Preprocessing

Step 3) Build a decision tree on training data

  • learn the relationship between predictors and target variable

Step 4) Evaluate Decision Tree Model Performance

8
New cards

Step 1: Kicked Vehicle Data

The dataset includes information about vehicles bought at an auction, such as odometer readings, warranty costs, vehicle age, etc.

  • Predictor Variables (X)

  • Target Variable (Y)

  • Classifier

9
New cards

Predictor

variable used in forecasting the values of another variable, known as the target variable

10
New cards

Target Variable

Variable whose outcomes are modeled and predicted based on the predictors.

11
New cards

Classifier

type of model or algorithm in machine learning that categorizes input data into specific classes

12
New cards

Step 2: Data Preprocessing

Dummy coding

Splitting Data

13
New cards

Dummy coding

convert categorical variables into numeric format

Create dummy variables to replace the original categorical variable

  • binary variable: gender

<p>convert categorical variables into numeric format </p><p>Create dummy variables to replace the original categorical variable</p><ul><li><p>binary variable: gender</p></li></ul><p></p>
14
New cards

Splitting Data (Data Preprocessing)

Dividing the dataset into training and testing sets for model evaluation

15
New cards

Decision Tree structure

Leaf Node

Decision Node

Root Node

16
New cards

Leaf Node

Endpoint of the tree where no further splitting occurs. It represents a class label (outcome), such as a prediction result (Yes or No).

17
New cards

Decision Node

Includes both the root node and internal nodes. These nodes contain a predictor and make decisions based on input features.

18
New cards

Root Node

First node of the tree, represents the entire population or dataset. It splits into two or more homogeneous subsets based on the most significant attribute.

19
New cards

Step 3: Build a Decision Tree

Splitting Data

Entropy

Information Gain

Recursive Splitting

20
New cards

Splitting Data

Decision trees split the dataset based on a predictor that creates the most homogeneous subgroups, measured by the purity of the subset.

21
New cards

Entropy

Represents the randomness or impurity in the dataset. It is used to decide how to split data within the decision tree.

  • Lower _____ indicates purer data subsets.

  • Higher information gain = lower _____

22
New cards

Information Gain

The reduction in entropy after a dataset is split based on a predictor. It helps in selecting the best feature for splitting the data at each step

  • Higher means a better split.

23
New cards

Recursive Splitting

Decision trees use a recursive approach to continue splitting the data until a stopping criterion is met

  • (e.g., all nodes are pure or maximum depth is reached).

24
New cards

Step 4: Evaluate Decision Tree Performance

Confusion Matrix

Evaluation Metrics

25
New cards

Confusion Matrix

A table used to evaluate the performance of a classification model by comparing predicted values to actual values.

<p>A table used to evaluate the performance of a classification model by comparing predicted values to actual values.</p>
26
New cards

Evaluation Metrics

Precision

Recall

F-Measure

Pruning

<p><strong>Precision</strong></p><p><strong>Recall</strong></p><p><strong>F-Measure</strong></p><p><strong>Pruning</strong></p>
27
New cards

Precision

Ratio of correctly predicted positive observations to the total predicted positives

  • shows how precise the model is when predicting the positive class.

28
New cards

Recall

Ability of the model to correctly identify all positive cases.

29
New cards

F-Measure

Harmonic mean of precision and recall, providing a balanced measure of the model’s performance.

30
New cards

Pruning

Process used to reduce the complexity of the model and avoid overfitting by removing branches that have little importance.