4 Hyperparameters and tuning

Deep Learning With Tensorflow

Overview

  • The focus of this document is on hyperparameters and hyperparameter tuning in the context of deep learning using TensorFlow.

  • The course is oriented towards BCA Hons (V Sem) students.

Goal Of Training

  • The main objective of training deep neural networks is to enhance model performance during inference (prediction) on unseen data, as opposed to merely training accuracy.

  • Inference: The stage when the model is tested against new data.

  • Training Time vs. Inference Time: Models are trained using specific datasets, and their effectiveness is evaluated using a separate test dataset (the test dataset must not have been seen during training).

  • Ultimately, the goal is to learn the optimal weights and biases for the neural network to perform effectively in practical scenarios.

Parameters and Hyperparameters

Model Parameters

  • Definition: Parameters learned during training from input data; they include:

    • Weights

    • Biases

Model Hyperparameters

  • Definition: Parameters that define the structure of the model and influence how model parameters are determined during training. Examples include:

    • Learning rate

    • Number of layers

    • Number of units in each layer

  • Hyperparameters are typically set manually and tuned during a cross-validation phase.

Machine Learning Models

  • Model Definition: Comprised of hyperparameters that define how the neural network is structured. Key elements include:

    • Topology of the network (layers, units, interconnections)

    • The learned parameters (weights and biases)

  • There exists a dependency between hyperparameters and the learned model parameters since hyperparameters dictate how parameter learning occurs.

Model Selection

  • Model selection involves optimizing the model's hyperparameters to ensure optimal inference performance.

  • Tuning is generally done through iterative validation strategies (e.g., validation sets, cross-validation).

  • Multiple models are tested during this phase, ensuring that the final selected model is also evaluated on a separate test dataset to prevent overfitting.

Training, Validation, and Test Sets

  • Training Set: Used to learn the model parameters (weights and biases).

  • Validation (Dev) Set: Used for model selection and to assist in determining the generalization error; it is essential for updating hyperparameters.

  • Test Set: Employed to evaluate the performance of the fully trained model under real-world conditions.

  • Emphasizes the importance of keeping test and validation sets distinct to avoid biased error rate estimates during model evaluation.

The Design Process of Deep Learning

  • Iteration: Choosing hyperparameters (e.g., learning rate, number of layers) requires experimentation and iterative approaches since tools for optimal hyperparameter determination do not exist preemptively.

Bias and Variance

Bias

  • Definition: Represents error stemming from incorrect assumptions in the learning algorithm that can lead to underfitting.

Variance

  • Definition: Refers to error caused by sensitivity to small fluctuations in the training data, potentially leading to overfitting by modeling random noise in the training data.

Trade-off

  • The optimal model must accurately reflect the training data while demonstrating reliable performance on unseen data. Balancing bias and variance is often challenging due to their opposing nature.

Capacity

  • Definition: Refers to a model’s ability to adapt and fit various functions.

  • Low capacity can lead to underfitting, while excessive capacity can capture noise from the training dataset leading to overfitting.

Regularization Techniques

L2 Regularization

  • Introduced to mitigate the risks of overfitting by adding a penalty for large weights during optimization, ensuring smaller weights are favored to maintain generalization.

Dropout Regularization

  • A technique that trains a model using a subnetwork by randomly dropping neurons during training to reduce the risk of over-reliance on any specific neuron, which encourages redundancy among neurons and aids generalization.

Early Stopping

  • Used to halt training when performance on a validation set starts to degrade, thereby preventing overfitting while tuning hyperparameters based on the lowest observed validation error.

Batch Normalization

  • Involves normalizing the input of each layer to speed up training and improve performance, reducing the internal covariate shift that can occur as weights are updated.

Multiclass Classification

  • Softmax Activation Function: A nonsaturating function used to predict probabilities for multiclass problems, which can be represented as logistic regression in binary classification scenarios.

  • Softmax Loss Function: Utilizes cross-entropy loss for maximum likelihood estimation in multiclass classification settings.