W3: Classification: Decision Tree and Performance Evaluation

0.0(0)

Studied by 1 person

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/29

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

30 Terms

New cards

Classification

supervised learning technique in machine learning and statistics where the goal is to assign data points (objects) to predefined categories (classes) based on their features (attributes).

Objective: Predict the class label of an object based on its features.
Input: A dataset with labeled examples (features and corresponding class labels).
Output: A model that can predict the class label for new, unseen data.

<p>supervised learning technique in machine learning and statistics where the goal is to assign data points (objects) to predefined categories (classes) based on their features (attributes).</p><ul><li><p><strong>Objective: </strong>Predict the class label of an object based on its features.</p></li><li><p><strong>Input: </strong>A dataset with labeled examples (features and corresponding class labels).</p></li><li><p><strong>Output: </strong>A model that can predict the class label for new, unseen data.</p></li></ul><p></p>

New cards

Why classification?

Predicting the class or category of an action to enable timely and cost-efficient decision-making.

used in business, healthcare, finance, and other fields to streamline processes, reduce costs, and improve decision-making.

New cards

Types of Classification

Yes/no categorization: for two possible outcomes.

Example: Cat vs. Dog Classification

—

Multi-class classification involves categorizing images into three or more classes. This is more complex than binary classification as the model must learn to distinguish between multiple categories simultaneously.

Example: sentiment analysis (positive, neutral, negative)

<p>Yes/no categorization: for two possible outcomes.</p><ul><li><p>Example: Cat vs. Dog Classification</p></li></ul><p>—</p><p><strong>Multi-class classification</strong> involves categorizing images into three or more classes. This is more complex than binary classification as the model must learn to distinguish between multiple categories simultaneously.</p><ul><li><p>Example: sentiment analysis (positive, neutral, negative)</p></li></ul><p></p>

New cards

Decision Tree

powerful and intuitive machine learning technique that learn and express classification or prediction patterns in the form of a tree structure.

<p>powerful and intuitive machine learning technique that learn and express classification or prediction patterns in the form of a tree structure.</p>

New cards

Advantages of Decision Tree Algorithms

Easy to understand and visualize, making them ideal for explaining model decisions to non-experts.

Less number of data preparation steps unlike other machine learning algorithms
It is a non-parametric algorithm i.e. it does not require lot of assumptions

New cards

Disadvantage of Decision Tree Algorithms

prone to overfitting, creating complex decision rules that capture noise in training data, leading to poor generalization on unseen samples.

Training decision trees can be highly time-consuming, especially when dealing with large datasets or multiple continuous independent variables
- finding optimal split points requires evaluating numerous thresholds.
Decision trees are inherently unstable, with small changes in the input data potentially causing dramatic shifts in the tree structure predictions
- making them sensitive to variations in the training set.

New cards

Decision Tree Model Steps

Step 1) Kicked Vehicle Data

Step 2) Data Preprocessing

Step 3) Build a decision tree on training data

learn the relationship between predictors and target variable

Step 4) Evaluate Decision Tree Model Performance

New cards

Step 1: Kicked Vehicle Data

The dataset includes information about vehicles bought at an auction, such as odometer readings, warranty costs, vehicle age, etc.

Predictor Variables (X)
Target Variable (Y)
Classifier

New cards

Predictor

variable used in forecasting the values of another variable, known as the target variable

New cards

Target Variable

Variable whose outcomes are modeled and predicted based on the predictors.

New cards

Classifier

type of model or algorithm in machine learning that categorizes input data into specific classes

New cards

Step 2: Data Preprocessing

Dummy coding

Splitting Data

New cards

Dummy coding

convert categorical variables into numeric format

Create dummy variables to replace the original categorical variable

binary variable: gender

New cards

Splitting Data (Data Preprocessing)

Dividing the dataset into training and testing sets for model evaluation

New cards

Decision Tree structure

Leaf Node

Decision Node

Root Node

New cards

Leaf Node

Endpoint of the tree where no further splitting occurs. It represents a class label (outcome), such as a prediction result (Yes or No).

New cards

Decision Node

Includes both the root node and internal nodes. These nodes contain a predictor and make decisions based on input features.

New cards

Root Node

First node of the tree, represents the entire population or dataset. It splits into two or more homogeneous subsets based on the most significant attribute.

New cards

Step 3: Build a Decision Tree

Splitting Data

Entropy

Information Gain

Recursive Splitting

New cards

Splitting Data

Decision trees split the dataset based on a predictor that creates the most homogeneous subgroups, measured by the purity of the subset.

New cards

Entropy

Represents the randomness or impurity in the dataset. It is used to decide how to split data within the decision tree.