1/14
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is the motivation behind an activation function?
Map linear values into a non-linear space
What are the 5 steps of DNN training?
Randomly initialize Weights W and biases b
Calculate feed forward to estimate output y hat
Back propagation and compute gradient dW and db
Update W and b using gradient descent
Repeat steps 2, 3, and 4 until convergence
What are the nonlinear activation functions and their output ranges?
Sigmoid [0, 1]
Tanh [-1, 1]
ReLU [0, inf]
Leaky ReLU [-inf, inf]
ELU
What is underfittnig?
Model is too simple, too high of error in both training and test data
What is overfitting?
Statistical model exactly fits training data
What is dropout and what is its purpose?
Removing neuron from model to prevent overfitting
What is ensemble?
Instead of dropout, multiple networks are trained at the same time and then their results are counted as votes toward the final decision
What is batch normalization?
In each batch, each feature is normalized across the batch. So the Z-score relates to the z-scores of the same feature across all samples.
What are the advantages of batch normalization?
Speeds up training
Handles internal covariant shift
Makes training stable
What is layer normalization?
Each sample is normalized across the features. So the z-score relates to the z-score of the same sample across all features
What are the advantages of layer normalization?
Removes dependency on batches of data samples
Easier to apply to neural networks
What is minibatch and what is it’s purpose?
Purpose: more efficient/faster training
If we have a huge data set, say n=100,000, we partition it into m batches, where one batch is used per iteration of training to speed things up
What is learning rate?
The multiplier used for the gradient to compute how far we should move in the gradient direction. This changes over time as we get closer to convergence
What is the advantage of a convolutional neural network over an MLP?
The MLP may destroy spatial information during vectorization, whereas the CNN maintains spatial information