1/72
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
machine learning workflow steps
data preprocessing
model training
model evaluation
model selection
model testing
data sets used for training AI
training dataset
validation dataset
testing dataset
data preprocessing
preparing raw data for model training and analysis including
data cleaning
data transformation
feature engineering
encoding categorical data
feature selection
data splitting
data cleaning
addressing missing values through imputation or removal, correcting errors, and addressing outliers
data transformation
standardizing numerical data to ensure consistency in scale and distribution
feature engineering
creating new, informative features, or modifying existing ones, based on domain knowledge
encoding categorical data
converting non-numerical categories into a numerical format
feature selection
identifying and retaining the most relevant features to reduce data dimensionality
data splitting
dividing the preprocessed data into training, validation, and test sets
model training
focuses on the feeding the training data into machine learning algorithms to train models by tuning parameters like weights to minimize loss
model training process
feeding a training dataset
comprising features (input variables)
labels (output variables)
model training objective
adjusting the parameters of the model so it can accurately map the input to the output, typically involves minimizing a loss function
loss function
quantifies the difference between the predicted outputs of the model and the actual outputs in the training dataho
how to minimize the loss function
optimization algorithms, they iteratively adjust the models parameters like weights in a neural network to minimize the difference between predictions and actual results
model evaluation
assesses the training models on the validation set using relevant metrics for the problem
how is model performance measured
accuracy, precision, recall, F1 score for classification tasks, and MSE and MAE for regression tasks
hyperparameter tuning
a part of the model evaluation process, the configuration settings used to structure the machine learning model to enhance its performance or generalization
examples of hyperparameters
learning rate
number of layers in the neural network
number of trees in a random forest
goal of hyperparameter tuning
find the “sweet spot” where the model is complex enough to capture the underlying patterns in the data, but not so complex that it overfits to the training dataset
parameter vs hyperparameter
a parameter is a variable that is learned from the data during the training process, a hyperparameter is a variable that is set before the training period begins to control the learning process
overfitting
when the model learns the training data too well leading to poor generalization on new data
underfitting
when the model is too simple to capture the underlying patterns in the data, resulting in poor performance on both training sets and test sets
model selection
picking the best validation set performer, if multiple models are similar the simplest model is preferred
goal of model selection
to pick the optimal model for deployment
factors that are used to find the optimal model for deployment
model simplicity
training time
resource requirements
ease of interpretation
model testing
evaluating the chosen model on an unseen test dataset, which is a representative subset of the data that is expected to show up in the actual environment
dataset splitting goal
developing models that not only perform well on known data but also generalize efficiently to new, unseen data
training dataset
largest portion of the data and is used to train the model, it allows the model to learn patterns and relationships within the data by adjusting its parameters
validation dataset
used to tune hyperparameters and evaluate the model’s performance during training time, helps prevent overfitting and allows for model selection and optimization
test dataset
kept completely separate and is only used to assess the final model’s performance, it helps detect any overfitting and ensures the model will work well when deployed in real-life scenarios
the importance of splitting
preventing overfitting
model selection, choosing between different models and hyperparameters unbiased
unbiased performance estimation
iterative improvement, refine the model without contaminating the test dataset
the challenge of overfitting
adjusting the parameters solely based on the training dataset can lead to the model learning not only the general patterns, but also the peculiarities specific to the dataset
how to address overfitting
we continue training the model while monitoring its performance on both the training and validation datasets
what indicates overfitting
errors on the validation dataset may start to increase while errors on the training dataset continue to drop
why do we need a separate test set
detects indirect optimization, a subtle form of overfitting
information leakage
unbiased final evaluation
detecting validation set overfitting
reliable performance reporting
enhancing credibility
neural network
designed to simulate the way a human brain analyzes and processes information
neural network function
consist of layers of interconnected nodes, each of which perform a simple computation; output of these is passed through an activation function which helps standardize the output for the next layer
layers of a neural network
input layer
hidden layers
output layer
input layer
inputs are fed into the network
output layer
`
hidden layer
performs most of the computations, transforming inputs into features the model uses to make predictions; sits between the input and the output, processing data with weights and activation functions to learn patterns
activation functions
determine the output of a node, primarily to introduce non-linearity to the output as most real-world data is non-linear (couldn't handle complex patterns in data)
sigmoid (logistic) function
takes any real-valued number and maps it into a value between 0 and 1, used for model where we need to predict the probability of an output
tanh (hyperbolic tangent) function
maps real-valued numbers to values in the range of -1 to 1, useful when the model needs to predict values that are normalized
ReLU (rectified linear unit) function
allows only positive values to pass through it, and negative values are mapped to zero, allows models to converge faster and reduce the likelihood of vanishing gradient
output layer activation function
chosen according to the specific task, shapes the output into a form that matches the problem statement
regression
type of output layer activation function, no activation function is used or linear if the output is a real-valued prediction
binary classification
type of output layer activation function, the sigmoid function is typically used because it maps predictions to a probability distribution between two classes
multiclass classification
type of output layer activation function, the softmax function is used to produce a probability distribution over multiple classes