4 Hyperparameters and tuning
Deep Learning With Tensorflow
Overview
The focus of this document is on hyperparameters and hyperparameter tuning in the context of deep learning using TensorFlow.
The course is oriented towards BCA Hons (V Sem) students.
Goal Of Training
The main objective of training deep neural networks is to enhance model performance during inference (prediction) on unseen data, as opposed to merely training accuracy.
Inference: The stage when the model is tested against new data.
Training Time vs. Inference Time: Models are trained using specific datasets, and their effectiveness is evaluated using a separate test dataset (the test dataset must not have been seen during training).
Ultimately, the goal is to learn the optimal weights and biases for the neural network to perform effectively in practical scenarios.
Parameters and Hyperparameters
Model Parameters
Definition: Parameters learned during training from input data; they include:
Weights
Biases
Model Hyperparameters
Definition: Parameters that define the structure of the model and influence how model parameters are determined during training. Examples include:
Learning rate
Number of layers
Number of units in each layer
Hyperparameters are typically set manually and tuned during a cross-validation phase.
Machine Learning Models
Model Definition: Comprised of hyperparameters that define how the neural network is structured. Key elements include:
Topology of the network (layers, units, interconnections)
The learned parameters (weights and biases)
There exists a dependency between hyperparameters and the learned model parameters since hyperparameters dictate how parameter learning occurs.
Model Selection
Model selection involves optimizing the model's hyperparameters to ensure optimal inference performance.
Tuning is generally done through iterative validation strategies (e.g., validation sets, cross-validation).
Multiple models are tested during this phase, ensuring that the final selected model is also evaluated on a separate test dataset to prevent overfitting.
Training, Validation, and Test Sets
Training Set: Used to learn the model parameters (weights and biases).
Validation (Dev) Set: Used for model selection and to assist in determining the generalization error; it is essential for updating hyperparameters.
Test Set: Employed to evaluate the performance of the fully trained model under real-world conditions.
Emphasizes the importance of keeping test and validation sets distinct to avoid biased error rate estimates during model evaluation.
The Design Process of Deep Learning
Iteration: Choosing hyperparameters (e.g., learning rate, number of layers) requires experimentation and iterative approaches since tools for optimal hyperparameter determination do not exist preemptively.
Bias and Variance
Bias
Definition: Represents error stemming from incorrect assumptions in the learning algorithm that can lead to underfitting.
Variance
Definition: Refers to error caused by sensitivity to small fluctuations in the training data, potentially leading to overfitting by modeling random noise in the training data.
Trade-off
The optimal model must accurately reflect the training data while demonstrating reliable performance on unseen data. Balancing bias and variance is often challenging due to their opposing nature.
Capacity
Definition: Refers to a model’s ability to adapt and fit various functions.
Low capacity can lead to underfitting, while excessive capacity can capture noise from the training dataset leading to overfitting.
Regularization Techniques
L2 Regularization
Introduced to mitigate the risks of overfitting by adding a penalty for large weights during optimization, ensuring smaller weights are favored to maintain generalization.
Dropout Regularization
A technique that trains a model using a subnetwork by randomly dropping neurons during training to reduce the risk of over-reliance on any specific neuron, which encourages redundancy among neurons and aids generalization.
Early Stopping
Used to halt training when performance on a validation set starts to degrade, thereby preventing overfitting while tuning hyperparameters based on the lowest observed validation error.
Batch Normalization
Involves normalizing the input of each layer to speed up training and improve performance, reducing the internal covariate shift that can occur as weights are updated.
Multiclass Classification
Softmax Activation Function: A nonsaturating function used to predict probabilities for multiclass problems, which can be represented as logistic regression in binary classification scenarios.
Softmax Loss Function: Utilizes cross-entropy loss for maximum likelihood estimation in multiclass classification settings.