Note

0.0(0)

Take a practice test

Chat with Kai

View the linked PDF

Explore Top Notes

NaOH Titration Flashcards

Studied by 1 person

APES 5.6 Pest Control Methods

Studied by 17 people

Chapter Fourteen: Schizophrenia and Related Disorders

Studied by 12 people

Ozymandias- Notes (Anna)

Studied by 76 people

Ch 14 - Money and Banking

Studied by 21 people

Chapter 19 - Human Effects on Ecosystem

Studied by 21 people

5-Overfitting and Decision Threshold

Decision Boundary of a Classification Model

Definition:
- A conceptual dividing line that separates different classes in a classification problem
- Determined by machine learning algorithms based on training data
Importance:
- More features improve the model but complicate visualization
- The decision boundary acts as a mapping function for the training examples

Visualization of Decision Boundary

Decision Tree Model Example:
- Pictorial representation of decision boundary based on Age and Balance vs Default status
- Illustrates how different splits can lead to different labels (Default vs Not Default)

Complexity of Algorithms

Different Models:
- Different algorithms can learn boundaries of varying complexities
- Selected model must consider both assumptions and data nature

Stopping Criteria for Splitting

When to Stop Splitting:
- Achieve minimum impurity (pure node with uniform label)
- Avoid unnecessary additional splits that lead to no information gain (or decrease in Gini, or decrease in Variance)

Overfitting in Decision Trees

Definition of Overfitting:
- Full trees may memorize training data without learning general patterns
- Model learns noise as patterns from training data, failing to generalize
Example of Overfitted Model:
- Complex decision rules based on too narrow thresholds (Humidity levels)

Reasons for Overfitting

Cause:
- Random noise or fluctuations in training data are learned as significant patterns

Evaluating Overfitting

Error Analysis:
- Shows performance discrepancies between training and test sets as tree complexity increases

Simpler Tree Structures

Goal:
- Favor simpler models to avoid overfitting while still retaining predictive power
Importance of Majority Label:
- Assign majority class label based on class distribution in node

Supervised Learning Goals

Primary Objective:
- Discover patterns (making predictions) that generalize well to unseen data
- A model that works extremely well on training data does not necessarily mean it will work well on testing data

Avoiding Overfitting: Simplifying Models

Techniques:
- Early stopping in decision tree by setting a maximum depth

Hyper-Parameter Tuning (How do we determine the maximum depth?)

Hyperparameter: A parameter whose value is set before the learning process begins, which can significantly affect the performance of the model.
Tuning ML hyperparameters (such as decision-tree depth) is a tedious yet crucial task, as the performance of an algorithm can be highly dependent on the choice of hyperparameters.
Process (in order to find the best parameter):
- Hyperparameter tuning involves searching through a range of values to find a subset of parameter results that deliver the best performance on your data.

Pitfalls in Hyper-Parameter Tuning

Common Mistakes:
- Using test data in the training process causing data leakage
- Need to keep test data separate for unbiased evaluations

Hyper-Parameter Tuning with Validation Set

Recommendation:
- Utilize a validation set for tuning decisions while reserving the test set for final evaluations
- Common split is 80% training and 20% testing

Grid Search vs Random Search

Grid Search:
- Evaluates all combinations of hyperparameters within a pre-defined grid
- Can guarantee optimal results if the grid is sufficiently large and well-defined but is computationally expensive
Random Search:
- Selecting random combinations to evaluate instead of exhaustive evaluation
- More efficient and often yields good results

K-Fold Cross Validation (another way in hyperparameter tuning)

Concept:
- Divides dataset into K equal “folds” (combination of hyperparameters)
- The model is trained and evaluated multiple times, using different training and validation sets each time to ensure that the model's performance is consistent and not reliant on any specific partition of the data.
- Each fold serves as validation once while being excluded from training
Advantages:
- Better data usage and mitigates dependency on single train-test splits
Disadvantages:
- High computational cost and time-consuming
If theres 9 parameter options and 5 folds are used = 45 models built
Steps
1. Split the dataset into K folds.
2. Train the model K times, each time using K-1 folds for training and 1 fold for validation.
3. Compute the average performance metric (e.g., accuracy, F1-score, RMSE).
3. Select the best hyperparameter combination based on the highest validation performance.
4. Train the final model using the entire dataset with the best hyperparameters.

Standard Supervised Learning Process

Steps:
1. Split data into training and test sets (e.g., 80:20)
2. Apply simple train/validation split, or k-fold cross-validation on the training set only (e.g., 5-fold CV).
  I. If using cross validation, each iteration uses k-1 folds for training and 1 fold for validation.
3. Final model evaluation conducted on the untouched test set for fair results

Assigning Labels for Leaf Nodes

Leaf Nodes:
- Represent outcomes in classification (predicted labels) or regression (predicted values)
- Majority voting for classification or mean calculations for regression

Defining the Decision Threshold

Decision Threshold Role:
- Maps leaf nodes to binary labels based on a selected threshold (probability values accompanying predictions)
- Adjustments to the threshold should be contextual and not fixed at 0.5. It should never be a fixed number, it can be changed.

Probabilistic Predictions and Impact of Decision Threshold

Example:
- Predicted probabilities indicate likelihoods (e.g., chance of subscription cancellation)
- A higher decision threshold leads to more conservative predictions (fewer classifications as positive)

Pros and Cons of Decision Tree Learning

Advantages:
- Easy interpretation and understanding
- Minimal data prep required for categorical features (no need for one-hot encoding for categorical features)
Drawbacks:
- Prone to overfitting; requires pruning
- Tree structure is not very stableand can vary significantly with small changes in the training data, leading to high variance in model performance.
- Predictive accuracy can be mediocre

Note

0.0(0)

Take a practice test

Chat with Kai

View the linked PDF

Explore Top Notes

NaOH Titration Flashcards

Studied by 1 person

APES 5.6 Pest Control Methods

Studied by 17 people

Chapter Fourteen: Schizophrenia and Related Disorders

Studied by 12 people

Ozymandias- Notes (Anna)

Studied by 76 people

Ch 14 - Money and Banking

Studied by 21 people

Chapter 19 - Human Effects on Ecosystem

Studied by 21 people