1/28
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Artificial Neural Network (ANN)
Consists of many computation unit called neurons
Learns a complex mapping function that maps any input X to any output Y by training with lots of data
Deep Learning
An ANN with many layers, which allow it to learn increasingly complex concepts out of simple concepts
Artificial Neuron
Multiplies each input feature by a corresponding weight and then adds these values together with a bias term
This value is then put through a function called the activation function
Recurrent Network (RNN)
Neurons get feedback from its own output
Used for processing data with time or sequence information
Successfully used for machine translation, language modelling and time series prediction
Convolutional Neural Network
Neurons are connected to small sets of inputs only
Efficient for computer vision tasks such as object detection, image classification
Purpose of Activation Functions
Firing Decision
Helps to decide if the neurons should fire or not - fire only if they are relevant to the prediction (mathematical gate)
Bounded Values
Some activation functions provide a bound to the output values. This provides more stability during training
Non-linearity
Introduce non-linearity to the network
Most of the interesting problems in real life are non linear in nature and that requires a non-linear neural network to handle them
Types of Activation Functions
Linear Function
Sigmoid/Logistic
Tanh (Hyperbolic Tangent)
SoftMax
RELU (Rectified Linear Unit)
Leaky SELU
SELU (Scaled Exponential Linear Unit)
Parametric Rectified Linear Unit
SoftPlus
Linear Function
g(x) = x
Output is proportional to the input multiplied by weight
Generally used for regression at the output layer
Sigmoid Function
g(x) = 1/(1+e^-x)
Output values bounded between 0 and 1
Generally used for binary prediction at the output layer
TanH Function
g(x) = tanh(x)
Default activation for RNN layer
Softmax Function
One output for each class
Value will be normalized to between 0 and 1 such that values for all classes summed to 1
This allows comparison and thus supports multi-class classification
Commonly used in output layer for a multi-class classifier with the cross-entropy loss function
RELU (Rectified Linear Unit) Function
g(x) = max{0,x}
Looks like linear function but is non-linear
It is the default activation function (hidden layer) for many neural networks
Activation function to use for a regression problem at the output layer
Linear activation function
Activation function to use for binary classification at the output layer
Use sigmoid for the single output neuron in the output layer
Activation function to use for multiclass classification at the output layer
Use Softmax activation function, one output neuron per class
Activation function to use at the hidden layer
Common to start with ReLU activation function and use others to improve performance
Activation function to use at the input layer
No activation function
Back Propogation
A highly efficient algorithm that derives the optimal weight values in all the layers
It uses gradient descent and the chain rules to determine how to adjust the weight in each neuron in the network.
The weight adjustment starts from the output layer (where the error is calculated) and works back towards the input layer
Loss Function
Measures how much difference between correct (ground truth) output and predicted output
A loss function should return a high value for bad prediction and low value for good prediction
Common types of loss functions
Binary Cross-entropy - for Binary Classification
Categorical Cross-entropy - for Multi-class Classification
Mean-Squared Error - for Regression
Cross Entropy Loss Function
Also called Log loss function
Depending on how you encode your target label, you will use either categorical_crossentropy or sparse_categorical_crossentropy in Keras
Assuming we have 3 different classes: 0, 1, 2, and assume our target labels for two samples are [1, 2], we have two ways of representing the target labels:
One-hot-encoded target labels: y_true = [[0, 1, 0], [0, 0, 1]]
Integer target labels: y_true = [1, 2]
Optimizer
Adjusts the weights based on the errors in the prediction (as measured by loss function), using gradient descent
Training Epochs and Training Steps
When training a neural network, we usually feed the network with a batch of samples, instead of a single sample at a time
Training Epoch
Refers to one iteration (forward pass + backward pass) over ALL training samples
Training step
Refers to one iteration (forward + backward pass) over a single batch of samples. Involves a gradient update of weights
Learning Rate
Determines how fast the weights are adjusted by an optimizer during gradient descent
Width of a neural network
The number of units in a layer of a neural network
Depth of a neural network
The number of layers in a neural network
Mean-Squared Error loss function
Used for regression problems, where the output is a single continuous value. The output will be a single unit (for single output prediction)