3 Decision tree

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/26

flashcard set

Earn XP

Description and Tags

R & C

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

27 Terms

1
New cards

Decision Tree

Decision trees split data into branches based on feature values, creating decision nodes until reaching final predictions. A decision tree is a supervised learning algorithm for classification and regression tasks. It represents decisions and their possible outcomes in a tree-like structure, making the decision-making process transparent and interpretable.

2
New cards

pruning in the context of decision trees

Pruning is a technique used to reduce the complexity of a decision tree by removing unnecessary branches.

It aims to: Reduce overfitting. Improve model generalization on unseen data.

Two common types are: Pre-pruning: Stops tree growth early based on criteria (e.g., depth, minimum samples).

1. Post-pruning: Removes branches from a fully grown tree using validation data.

2. Pruning ensures that the tree remains interpretable and avoids capturing noise in the training data

3
New cards

Root Node

The decision-making process starts at the root node, representing the entire dataset. The model evaluates the best feature to split the data based on a chosen criterion like Gini Impurity, Information Gain (for classification), or Mean Squared Error (for regression).

4
New cards

How do we find the root node

  1. Calculate Impurity/Information Gain:

    • Gini Index: Measures impurity. Lower values are better.

    • Entropy: Measures disorder. Higher information gain is better.

  2. Evaluate Each Feature:

    • For each feature, calculate the Gini Index or Entropy for all possible splits.

    • Determine the information gain for each split.

  3. Select the Best Feature:

    • Choose the feature with the lowest Gini Index or highest information gain.

    • This feature becomes the root node.

5
New cards

Splitting

The dataset is divided into subsets based on the selected feature's values, creating branches from the root node. This process continues recursively at each subsequent node.

6
New cards

Internal Node

These nodes represent decision points where the dataset is further split based on other features.

7
New cards

Leaf Node

These are terminal nodes that provide the final prediction.

For classification tasks, the leaf nodes contain class labels, while for regression tasks, they provide numerical outputs.

8
New cards

Stopping Criteria:

All data points in a node belong to the same class (pure node).

■ A maximum tree depth is reached.

■ Minimum samples per leaf or minimum information gain criteria are met.

9
New cards

How do we split

  • Entropy

  • Ginny Index

<ul><li><p>Entropy</p></li><li><p>Ginny Index</p></li></ul><p></p>
10
New cards

Pure split vs Impure split

pure if all the resulting child nodes contain instances of only one class.

impure if the resulting child nodes contain a mix of different classes.

<p><span>pure if all the resulting child nodes contain instances of only one class.</span></p><p></p><p><span>impure if the resulting child nodes contain a mix of different classes.</span></p>
11
New cards

For huge data set we will use what split method

Gin Impurity (Categorical Feature)

12
New cards

problem with the decison tree

This will cause overfitting of data.

<p>This will cause overfitting of data.</p>
13
New cards

How do we reduce the overfittig of data

  • Post Pruning

  • Pre Pruning

14
New cards

Post Prunning

we will complete decision tree an later we will cut that (Pruning).

15
New cards

For smaller data we will use ____ Prunning

Post

16
New cards

Pre Pruning

we will use some parameters while construction of parameters.(max feature, Depth ,max depth, Split )

17
New cards

For huge data we will use

Pre Pruning

18
New cards

How does a Random Forest algorithm work?

Random Forest is an ensemble learning method that combines multiple decision trees to improve predictive performance.

1.Bootstrap Sampling: Random subsets of the data are created (with replacement)

2.Tree Construction: Each tree is built independently using a subset of features.

3.Aggregation: For classification, predictions are based on majority voting. For regression, predictions are averaged.

This approach reduces overfitting and increases robustness, making it suitable for high-dimensional data.

19
New cards

Decision Tree vs Random Forest

Feature

Decision Tree

Random Forest

Algorithm Type

Single tree-based model

Ensemble of multiple decision trees

Overfitting

Prone to overfitting

Less prone to overfitting due to averaging

Bias-Variance Tradeoff

High variance, low bias

Lower variance, slightly higher bias

Interpretability

Easy to interpret and visualize

Harder to interpret due to multiple trees

Training Time

Faster training time

Slower training time due to multiple trees

Prediction Time

Faster prediction time

Slower prediction time due to averaging predictions

Accuracy

Can be less accurate due to overfitting

Generally more accurate due to ensemble approach

20
New cards

How do we select feature

Using entropy (measure the randomness of a system)

21
New cards

Gini Index vs Entropy

Aspect

Gini Index

Entropy

Definition

Measures the impurity or impurity of a dataset.

Measures the disorder or uncertainty in a dataset.

Formula

Gini=1−∑i=1npi2Gini=1−∑i=1n​pi2​

Entropy=−∑i=1npilog⁡2(pi)Entropy=−∑i=1n​pi​log2​(pi​)

Range

0 (pure) to 0.5 (maximum impurity for binary classification)

0 (pure) to 1 (maximum disorder for binary classification)

Interpretation

Lower values indicate purer nodes.

Higher values indicate more disorder.

Calculation Complexity

Simpler and faster to compute.

More complex and computationally intensive.

Usage in Decision Trees

Often preferred due to computational efficiency.

Provides more information gain but is computationally heavier.

Sensitivity to Changes

Less sensitive to changes in the dataset.

More sensitive to changes in the dataset.

22
New cards

TYPES of Decision trees - CART vs C4.5 vs ID3

Aspect

CART

C4.5

ID3

Splitting Criterion

Gini Index

Information Gain Ratio

Information Gain

Tree Structure

Binary trees (two children per node)

Multi-way trees (multiple children per node)

Multi-way trees (multiple children per node)

Data Types

Continuous and categorical

Continuous and categorical

Categorical only

Pruning

Cost-complexity pruning

Error-based pruning

No pruning

Handling Missing Values

Surrogate splits

Assigns probabilities

Does not handle missing values

Advantages

Simple, fast, easy to interpret

Handles both data types, robust pruning

Simple, easy to understand

Disadvantages

Can overfit without pruning

More complex, computationally intensive

Can overfit, does not handle continuous data

23
New cards

Do we require Scaling ?

Tree-based Algorithms: Algorithms like Decision Trees and Random Forests don't need scaling because they split data based on feature values, not distances.

24
New cards

How Does the Decision Tree Algorithm Work for Classification in Machine Learning?

A Decision Tree is a supervised machine learning algorithm used for classification and regression tasks. It works by splitting the dataset into smaller subsets based on feature values, forming a tree-like structure.

Select the Best Feature (Splitting Criterion):

The algorithm chooses the feature that best separates the data using criteria like Gini Impurity, Entropy, or Information Gain (in ID3, C4.5, or CART algorithms).

Split the Dataset:

Based on the selected feature, the data is split into branches.

Each branch represents a possible decision or outcome.

Repeat Recursively:

The process continues recursively on each subset until one of the stopping conditions is met:

All data points in a node belong to the same class.

The maximum tree depth is reached.

The information gain from further splits is too small

Make Predictions: Once the tree is built, new data points are classified by following the decision paths from the root to a leaf node, where a class label is assigned.

25
New cards

How Does Gradient Boosted Decision Trees (GBDT)Differ from Random Forests?

Gradient Boosted Decision Trees (GBDT):

An ensemble method where trees are builtsequentially. Each tree attempts to correct the errors ofthe previous one. The focus is on minimizing a lossfunction (e.g., mean squared error) through gradientdescent.

Advantages: Strong predictive performance, especiallyfor structured data, and can handle different types ofloss functions.

Random Forests:

An ensemble method that builds multiple decisiontrees in parallel, each trained on a random subset of the data with random feature selection for splittingnodes.

Advantages: More robust to overfitting than a singledecision tree and generally faster to train than GBDT.

GBDT usually performs better in terms of accuracy but is slower and more prone to overfitting if not tuned properly,while Random Forests are easier to train and tune.

26
New cards

What distinguishes a poorly performing classifier(a "bad classifier") from a Random Forest model,and how do their design and performancecharacteristics differ in practice?

A "bad classifier" typically refers to a model thatunderperforms due to flaws like high bias/variance, poorgeneralization, or unsuitability for the data (e.g., linearmodels for non-linear problems). Random Forest, anensemble of decision trees, addresses many of theseweaknesses:

1.Model Complexity :

a. Bad Classifier: Often overly simplistic (e.g., a singleshallow decision tree) or overly complex (e.g., an overfitneural network) for the task.

2.Overfitting:a. Bad Classifier:

Prone to overfitting (e.g., a deep,unpruned decision tree) or underfitting (e.g., a linearmodel on non-linear data).

b. Random Forest: Mitigates overfitting by averagingpredictions across diverse trees trained on randomsubsets of data/features.

3.Feature Interactions:

a. Bad Classifier: May ignore complex feature relationships(e.g., logistic regression).

b. Random Forest: Automatically captures non-linearinteractions and hierarchies through split decisions inmultiple trees.

4.Robustness:

a Bad Classifier: Sensitive to noise, outliers, or irrelevantfeatures (e.g., k-NN without scaling).

b. Random Forest: Robust to noise and outliers due tomajority voting and feature subsampling.

5.Interpretability:

a. Bad Classifier: Sometimes overly interpretable butineffective (e.g., Naive Bayes with violated assumptions).

b. Random Forest: Less interpretable than single trees butprovides feature importance scores for insights.

6.Scalability:

a. Bad Classifier: May scale poorly (e.g., SVM with large datasets).

b. Random Forest: Parallelizable and efficient for medium-large datasets, though slower than gradient-boosted trees.

27
New cards

How Do Partial Dependence Plots Help InterpretMachine Learning Models?

Partial Dependence Plots (PDPs) visualize the relationshipbetween a feature and the predicted outcome, whilekeeping other features constant.

Function:

PDPs help interpret the impact of a singlefeature or a pair of features on the model’s prediction.

Use: These plots show whether a feature has a linear,non-linear, or no significant effect on the targetvariable, aiding in model interpretability by highlightingthe influence of individual predictors