4 Hyperparameters and tuning

The focus of this document is on hyperparameters and hyperparameter tuning in the context of deep learning using TensorFlow.
The course is oriented towards BCA Hons (V Sem) students.

The main objective of training deep neural networks is to enhance model performance during inference (prediction) on unseen data, as opposed to merely training accuracy.
Inference: The stage when the model is tested against new data.
Training Time vs. Inference Time: Models are trained using specific datasets, and their effectiveness is evaluated using a separate test dataset (the test dataset must not have been seen during training).
Ultimately, the goal is to learn the optimal weights and biases for the neural network to perform effectively in practical scenarios.

Definition: Parameters learned during training from input data; they include:
- Weights
- Biases

Definition: Parameters that define the structure of the model and influence how model parameters are determined during training. Examples include:
- Learning rate
- Number of layers
- Number of units in each layer
Hyperparameters are typically set manually and tuned during a cross-validation phase.

Model Definition: Comprised of hyperparameters that define how the neural network is structured. Key elements include:
- Topology of the network (layers, units, interconnections)
- The learned parameters (weights and biases)
There exists a dependency between hyperparameters and the learned model parameters since hyperparameters dictate how parameter learning occurs.

Model selection involves optimizing the model's hyperparameters to ensure optimal inference performance.
Tuning is generally done through iterative validation strategies (e.g., validation sets, cross-validation).
Multiple models are tested during this phase, ensuring that the final selected model is also evaluated on a separate test dataset to prevent overfitting.

Training Set: Used to learn the model parameters (weights and biases).
Validation (Dev) Set: Used for model selection and to assist in determining the generalization error; it is essential for updating hyperparameters.
Test Set: Employed to evaluate the performance of the fully trained model under real-world conditions.
Emphasizes the importance of keeping test and validation sets distinct to avoid biased error rate estimates during model evaluation.

Iteration: Choosing hyperparameters (e.g., learning rate, number of layers) requires experimentation and iterative approaches since tools for optimal hyperparameter determination do not exist preemptively.

Definition: Represents error stemming from incorrect assumptions in the learning algorithm that can lead to underfitting.

Definition: Refers to error caused by sensitivity to small fluctuations in the training data, potentially leading to overfitting by modeling random noise in the training data.

The optimal model must accurately reflect the training data while demonstrating reliable performance on unseen data. Balancing bias and variance is often challenging due to their opposing nature.

Definition: Refers to a model’s ability to adapt and fit various functions.
Low capacity can lead to underfitting, while excessive capacity can capture noise from the training dataset leading to overfitting.

Introduced to mitigate the risks of overfitting by adding a penalty for large weights during optimization, ensuring smaller weights are favored to maintain generalization.

A technique that trains a model using a subnetwork by randomly dropping neurons during training to reduce the risk of over-reliance on any specific neuron, which encourages redundancy among neurons and aids generalization.

Used to halt training when performance on a validation set starts to degrade, thereby preventing overfitting while tuning hyperparameters based on the lowest observed validation error.

Involves normalizing the input of each layer to speed up training and improve performance, reducing the internal covariate shift that can occur as weights are updated.

Softmax Activation Function: A nonsaturating function used to predict probabilities for multiclass problems, which can be represented as logistic regression in binary classification scenarios.
Softmax Loss Function: Utilizes cross-entropy loss for maximum likelihood estimation in multiclass classification settings.