1/42
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Classification
\ is a supervised machine learning technique that sorts data into predefined categories or classes.
classification
The goal is to to predict a categorical outcome (label/class) based on input data.
Training Phase
A classification model is built using a dataset where each data point is already labeled with its correct class.
Prediction Phase
Once the model is trained, it can be used to classify new, unlabeled data.
Binary Classification
This is the simplest type, where the model predicts between two possible classes.
Multiclass Classification
The model predicts among more than two classes.
Accuracy
TP + TN / TP + TN + FP + FN
Precision
TP / TP + FP
Recall
TP / TP + FN
F1-Score
2 x (Precision x Recall) / Precision + Recall
Specificity
TN / TN + FP
Logistic Regression
is a statistical and machine learning method used for classification problems, especially when the outcome has two possible results.
Decision tree
is a supervised machine learning algorithm that can be used for both classification (sorting data into categories) and regression (predicting continuous values) tasks.
Overfitting
A decision tree can grow very deep and complex, essentially memorizing the noise and small fluctuations in the training data rather than learning the true underlying patterns.
Instability
Decision trees are very sensitive to small changes in the training data.
Bias toward dominant classes
If the dataset is imbalanced (one class has significantly more data points than others), the tree may become biased towards the majority class and fail to generalize well for the minority classes.
Random Forests
are an ensemble learning method that addresses the weaknesses of a single decision tree.
K-Nearest Neighbors (KNN)
The idea behind K-Nearest Neighbors (KNN) is very simple.
Regression
is a statistical method used to predict or explain the relationship between independent variables (IVs) and dependent variable (DV).
]regression
The objective is to find the best-fitting curve for a dependent variable in a multidimensional space, with each independent variable being a dimension.
Simple Linear Regression
A statistical method used to model the relationship between one independent variable (X)and one dependent variable (Y) by fitting a straight line.
Multiple Linear Regression
predicts a dependent variable based on multiple predictors.
The coefficient of determination or R²
measures the proportion of variance in the dependent variable explained by the independent variable(s).
Mean Absolute Error (MAE)
measures the average absolute difference between actual and predicted values.
Mean Squared Error (MSE)
measures the average of squared differences between actual and predicted values.
Root Mean Squared Error (RMSE)
is the square root of MSE and is one of the most commonly used regression metrics.
Model Lifecycle
describes the complete journey of a machine learning model, starting from identifying and defining the problem, collecting and preparing data, building and training the model, evaluating its performance, deploying it into real-world use, and continuously monitoring and maintaining it to ensure accuracy and effectiveness over time.
Problem Definition
In this stage, the goal is to clearly understand what problem the model will solve and why it is needed.
Data Collection
involves gathering all relevant information needed to train the model.
Data Preparation
is a crucial phase in model lifecycle that involves transforming raw data into a clean and usable format for modeling.
Data cleaning
removing duplicate records, correcting errors, and handling missing values to improve data quality.
Data transformation
converting data into appropriate formats, such as scaling numerical values or encoding categorical variables.
Exploratory Data Analysis
analyzing data using statistics and visualizations to understand patterns, distributions, and relationships.
Feature engineering
creating new features from existing data to improve the model’s predictive capability.
Feature selection
selecting relevant features and removing unnecessary or irrelevant variables.
Data splitting
dividing the dataset into training and testing sets to properly train and evaluate the model.
Model Selection
This stage involves selecting the most appropriate algorithm or model to solve a specific problem based on the type of data and the desired outcome.
Cross-Validation
Divide the data into several subsets to train and test the model multiple times, ensuring reliable performance and preventing overfitting.
Hyperparameter Tuning
Adjust the model’s settings (hyperparameters) to improve its accuracy and overall performance.
Model Evaluation
The stage at which the trained model is evaluated on previously unseen data.
Conduct Error Analysis
Analyze incorrect predictions to identify weaknesses and improve model accuracy.
Perform Sensitivity Analysis
Determine how changes in input features affect predictions to understand feature importance and model stability.
Model Deployment
The final phase of the machine learning lifecycle, where the trained model is integrated into a production environment, allowing it to make predictions on new data.