Week 1 - Introduction to Deep Learning

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/28

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

29 Terms

New cards

Artificial Neural Network (ANN)

Consists of many computation unit called neurons
Learns a complex mapping function that maps any input X to any output Y by training with lots of data

New cards

Deep Learning

An ANN with many layers, which allow it to learn increasingly complex concepts out of simple concepts

New cards

Artificial Neuron

Multiplies each input feature by a corresponding weight and then adds these values together with a bias term
This value is then put through a function called the activation function

New cards

Recurrent Network (RNN)

Neurons get feedback from its own output
Used for processing data with time or sequence information
Successfully used for machine translation, language modelling and time series prediction

New cards

Convolutional Neural Network

Neurons are connected to small sets of inputs only
Efficient for computer vision tasks such as object detection, image classification

New cards

Purpose of Activation Functions

Firing Decision
1. Helps to decide if the neurons should fire or not - fire only if they are relevant to the prediction (mathematical gate)
Bounded Values
1. Some activation functions provide a bound to the output values. This provides more stability during training
Non-linearity
1. Introduce non-linearity to the network
2. Most of the interesting problems in real life are non linear in nature and that requires a non-linear neural network to handle them

New cards

Types of Activation Functions

Linear Function
Sigmoid/Logistic
Tanh (Hyperbolic Tangent)
SoftMax
RELU (Rectified Linear Unit)
Leaky SELU
SELU (Scaled Exponential Linear Unit)
Parametric Rectified Linear Unit
SoftPlus

New cards

Linear Function

g(x) = x
Output is proportional to the input multiplied by weight
Generally used for regression at the output layer

New cards

Sigmoid Function

g(x) = 1/(1+e^-x)
Output values bounded between 0 and 1
Generally used for binary prediction at the output layer

New cards

TanH Function

g(x) = tanh(x)
Default activation for RNN layer

New cards

Softmax Function

One output for each class
Value will be normalized to between 0 and 1 such that values for all classes summed to 1
This allows comparison and thus supports multi-class classification
Commonly used in output layer for a multi-class classifier with the cross-entropy loss function

New cards

RELU (Rectified Linear Unit) Function

g(x) = max{0,x}
Looks like linear function but is non-linear
It is the default activation function (hidden layer) for many neural networks

New cards

Activation function to use for a regression problem at the output layer

Linear activation function

New cards

Activation function to use for binary classification at the output layer

Use sigmoid for the single output neuron in the output layer

New cards

Activation function to use for multiclass classification at the output layer

Use Softmax activation function, one output neuron per class

New cards

Activation function to use at the hidden layer

Common to start with ReLU activation function and use others to improve performance

New cards

Activation function to use at the input layer

No activation function

New cards

Back Propogation

A highly efficient algorithm that derives the optimal weight values in all the layers
It uses gradient descent and the chain rules to determine how to adjust the weight in each neuron in the network.
The weight adjustment starts from the output layer (where the error is calculated) and works back towards the input layer

New cards

Loss Function

Measures how much difference between correct (ground truth) output and predicted output
A loss function should return a high value for bad prediction and low value for good prediction

New cards

Common types of loss functions

Binary Cross-entropy - for Binary Classification
Categorical Cross-entropy - for Multi-class Classification
Mean-Squared Error - for Regression

New cards

Cross Entropy Loss Function

Also called Log loss function
Depending on how you encode your target label, you will use either categorical_crossentropy or sparse_categorical_crossentropy in Keras
Assuming we have 3 different classes: 0, 1, 2, and assume our target labels for two samples are [1, 2], we have two ways of representing the target labels:
- One-hot-encoded target labels: y_true = [[0, 1, 0], [0, 0, 1]]
- Integer target labels: y_true = [1, 2]

New cards

Optimizer

Adjusts the weights based on the errors in the prediction (as measured by loss function), using gradient descent

New cards

Training Epochs and Training Steps

When training a neural network, we usually feed the network with a batch of samples, instead of a single sample at a time

New cards

Training Epoch

Refers to one iteration (forward pass + backward pass) over ALL training samples

New cards

Training step

Refers to one iteration (forward + backward pass) over a single batch of samples. Involves a gradient update of weights

New cards

Learning Rate

Determines how fast the weights are adjusted by an optimizer during gradient descent

New cards

Width of a neural network

The number of units in a layer of a neural network

New cards

Depth of a neural network

The number of layers in a neural network

New cards

Mean-Squared Error loss function

Used for regression problems, where the output is a single continuous value. The output will be a single unit (for single output prediction)