Deep Learning - Computer Vision

0.0(0)

Studied by 1 person

0.0(0)

Call with Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/50

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No study sessions yet.

51 Terms

New cards

what is deep learning

a subset of machine learning that uses artificial multi-layered neural networks to learn and simulate the complex decision making power of the human brain

New cards

what part of ML is DL part of

mainly supervised learning, but has techniques that can be used in unsupervised and reinforcement learning

New cards

what is compter vision

a field in AI using DL to teach computers and systems to derive important information from images, videos, and other visual inputs, in order to make recommendations or take certain actions

New cards

what can computer vision be used for

image classification
object detection
face recognition
image synthesis
OCR - convert images to machine readable text
enhance image resolution

New cards

what is natural language processing

subfield of AI using ML and DL that enables computers to understand and communicate with human language e.g. recognise, generate, and understand text

New cards

what can natural language processing be used for

translate text
generate text
sentiment analysis
summarise text
converse with humans

New cards

what domains can deep learning be used on

self driving cars
medical imaging
robot navigation

New cards

what factors allowed deep learning to grow in popularity

data availability: better training datasets
GPU power: increased computation to train with large datasets and complex models
bigger NN models: can learn better and more complex patterns
hardware-constrained optimisation - using the biggest model that fits in the hardware we have
moore’s law: exponential growth of computing power over time (idea that the number of transistors on a single chip doubles every two years)

New cards

difference between an neural network and a perceptron

perceptron is a single layer neural network for linear binary classification while neural network is multi-layered (contains many hidden layers) and can perform more complex tasks with unsupervised learning

New cards

what is the network architecture of a NN

input layer
many hidden layers
output layers where each node in the output layer corresponds to a label in the dataset

New cards

how do computers view images

each image is made up of pixels with different values depending on the colour of the pixel.

New cards

what are the steps to train a NN for image classification (basic feed forward network)

weights and bias initialised to random values
each image in the dataset is given as an input to the NN
convert/flatten each image
1. aka. grid of pixels as a 2D array, into a vector (1D array) of input nodes. e.g. 3×3 image has 9 input nodes, each node in the FIRST hidden layer will have 9 connections - fully connected to each input node
perform a forward pass - activation functions and neuron firing
obtain output as a vector of values (predicted value) for each image
compute the loss, e.g. using mean squared error
after a batch, perform Backpropagation to compute the gradient of the loss for each weight
update the weights by using an optimisation algorithm based on the gradients
repeat forward pass, computer loss, backward pass, weight update for each batch in training data until full epoch has completed
repeat for as many epochs as defined.

<ol><li><p>weights and bias initialised to random values </p></li><li><p>each image in the dataset is given as an input to the NN </p></li><li><p>convert/flatten each image </p><ol><li><p>aka. grid of pixels as a 2D array, into a vector (1D array) of input nodes. e.g. 3×3 image has 9 input nodes, each node in the FIRST hidden layer will have 9 connections - fully connected to each input node </p></li></ol></li><li><p><strong>perform a forward pass</strong> - activation functions and neuron firing </p></li><li><p><strong>obtain output </strong>as a vector of values (predicted value) for each image </p></li><li><p><strong>compute the loss, </strong>e.g. using mean squared error </p></li><li><p>after a batch, perform Backpropagation to compute the gradient of the loss for each weight </p></li><li><p><strong>update the weights</strong> by using an optimisation algorithm based on the gradients </p><p></p></li><li><p>repeat forward pass, computer loss, backward pass, weight update for each batch in training data until full epoch has completed </p></li><li><p>repeat for as many epochs as defined.</p></li></ol><p></p>

New cards

what optimisation algorithms can we use to fine-tune the weights

Gradient Descent (Batch Gradient Descent)
Stochastic Gradient Descent (SGD)
Mini-Batch Gradient Descent
Momentum
Adagrad (Adaptive Gradient Algorithm)
RMSprop (Root Mean Square Propagation)
Adam (Adaptive Moment Estimation)
Adadelta
Nadam (Nesterov-accelerated Adaptive Moment Estimation)
L-BFGS (Limited-memory Broyden-FletcherGoldfarb-Shanno)

New cards

what is the point of gradient descent

lower the loss function for a machine learning problem relatively fast and reliably - not getting stuck at local minima, saddle points, or plateaus but reaching the global minima
optimise the values of all thetas (weights) such that the final combination of weights reduces the loss function to it’s global minima

<ul><li><p>lower the loss function for a machine learning problem relatively fast and reliably - not getting stuck at local minima, saddle points, or plateaus but reaching the global minima</p></li><li><p>optimise the values of all thetas (weights) such that the final combination of weights reduces the loss function to it’s global minima</p></li></ul><p></p>

New cards

what is an epoch

1 complete pass through the training data

New cards

what is a batch

a subset of the training data, as a hyperparameter defines number of samples to go through before processing instead of processing each data point individually to improve efficiency

e.g. group images and compute mean error, and backpropagate the mean error to update the weights
common batch size is 4 - 512 depending on size of images and memory

New cards

how can neural networks be differentiated

shallow networks

deep networks

New cards

what is a shallow neural network

have 1 or none hidden layers
faster to train and less prone to overfitting due to smaller size
have limited capability to extract complex features

<ul><li><p>have 1 or none hidden layers</p></li><li><p>faster to train and less prone to overfitting due to smaller size</p></li><li><p>have limited capability to extract complex features</p></li></ul><p></p>

New cards

what is a deep neural network

had many hidden layers
more complex to train and prone to overfitting due to larger size
can extract the hierarchy of features

<ul><li><p>had many hidden layers</p></li><li><p>more complex to train and prone to overfitting due to larger size</p></li><li><p>can extract the hierarchy of features</p></li></ul><p></p>

New cards

what is the difference between deep learning and machine learning

machine learning requires manual feature extraction and selection, and uses a classifier (machine learning model) with a shallow structure
neural networks have feature learning combined with the classifier

<ul><li><p>machine learning requires manual feature extraction and selection, and uses a classifier (machine learning model) with a shallow structure</p></li><li><p>neural networks have feature learning combined with the classifier</p></li></ul><p></p>

New cards

what are some different types of neural networks

recurrent neural networks RNN
deep convolutional networks DCN / CCN
long short term memory LSTM

New cards

what is a convolutional neural network

a type of neural network that uses three-dimensional data for image classification and object recognition tasks.

New cards

why do we need convolutional neural networks

better for large and complex images

classifying images using a basic feed-forward neural network doesn’t scale well with larger images, since as the number of hidden layers increases, the number of nodes and connections increases, with weights that need to be assigned to each connection - computationally expensive
- here, need to estimate 10k weights per node in the hidden layer
not clear if the NN will perform well if the image is shifted by one pixel
doesn’t take into consideration that there may be a correlation between pixels in an image, e.g. a picture of Pongu, blue pixels likely to be near other blue pixels

New cards

what do convolutional neural networks do differently to make image classification practical

reduce number of input nodes (pooling)
tolerate small shifts in where the pixels are in the image
takes advantage of the correlation between pixels in complex images (using the filter to look at a region of pixels at a time)

New cards

what are the layers/ operations in a CNN

convolution

pooling

dropout

activation function: RELU

New cards

what is the convolution operation

applies a filter / kernel to the input image in produce a feature map

New cards

what is a filter / kernel

a smaller square (usually 3x3 pixels), that contains a value for each pixel - the intensity of each pixel is randomised at first and then determined by backpropagation
looks at a group of pixels at a time and can detect local features like edges, textures, and patterns in an image

<ul><li><p>a smaller square (usually 3x3 pixels), that contains a value for each pixel - the intensity of each pixel is randomised at first and then determined by backpropagation</p></li><li><p>looks at a group of pixels at a time and can detect local features like edges, textures, and patterns in an image</p></li></ul><p></p>

New cards

how does the convolution operation work

overlay the filter onto the input image and calculate the dot product for that section
- dot product = multiply each overlapping pixel together and then sum the products (and then add bias after)
the dot product is placed on the feature map
shift the filter on the image and repeat the process until the whole feature map has been created (overlap allowed)

<ul><li><p>overlay the filter onto the input image and calculate the dot product for that section</p><ul><li><p>dot product = multiply each overlapping pixel together and then sum the products (and then add bias after)</p></li></ul></li><li><p>the dot product is placed on the feature map</p></li><li><p>shift the filter on the image and repeat the process until the whole feature map has been created (overlap allowed)</p></li></ul><p></p>

New cards

what values do we need to determine to do the convolution operator

filter size - usually 3x3 means 9 weight parameters and 1 bias parameter
step size - by how many pixels should the filter shift at each iteration, e.g. shift by 1 pixel to the right, shift by 4 pixels to the right

New cards

what do we do after the convolution operator

run the feature map through the RELU activation function before passing it through the pooling operator

done to introduce non-linearity so that the NN can learn complex patterns (e.g. learning textures and abstract features instead of just edges)

New cards

what is the pooling operator

a way of sub-sampling, aka. reducing the dimension (width and height) of the input
reduce the resolution of the feature map while still retaining the important features for classification

New cards

how does pooling work

another filter / kernel is applied to the feature map
apply a pooling operator like Max or Average to the overlap and get a value
the filter moves so that it does not overlap, and covers a new area of the feature map
the process is repeated until the whole feature map has been covered and the Pooled Layer output has been produced

<ul><li><p>another filter / kernel is applied to the feature map</p></li><li><p>apply a pooling operator like Max or Average to the overlap and get a value</p></li><li><p>the filter moves so that it does not overlap, and covers a new area of the feature map</p></li><li><p>the process is repeated until the whole feature map has been covered and the Pooled Layer output has been produced</p></li></ul><p></p>

New cards

what values need to be defined for the pooling operator

pooling size - similar to kernel size
step
type of operator used, e.g. max pooling or average pooling

New cards

what is max pooling

take the highest value from the area covered by the kernel

New cards

what is average pooling

calculates the average value form the area covered by the kernel

New cards

how do we convert the Pooled Layer into a neural network

turn the pooled layer into a vector / column of input nodes and use them as the input nodes for the NN

New cards

what is the dropout layer

a regularisation technique where nodes are dropped, meaning the activation for that node is considered 0

helps the network generalise better and prevents overfitting

New cards

how does dropout occur

dropout rate p

during the training stage, a percentage of nodes are randomly set to 0
during the testing stage, dropout is not enabled, but remaining nodes are scaled up by a factor of (1 - dropout rate) to compensate for the amount of dropped neurons during training, and to make the output more consistent between training and testing (even though all nodes used during testing)

<p>dropout rate <em>p</em></p><ul><li><p>during the training stage, a percentage of nodes are randomly set to 0</p></li><li><p>during the testing stage, dropout is not enabled, but remaining nodes are scaled up by a factor of (1 - dropout rate) to compensate for the amount of dropped neurons during training, and to make the output more consistent between training and testing (even though all nodes used during testing) </p></li></ul><p></p>

New cards

what are the different types of activation functions for convolutional neural networks

TanH
ReLU function - applied to the feature map before pooling
softmax function - applied to the output of the CNN

<ul><li><p>TanH</p></li><li><p>ReLU function - applied to the feature map before pooling</p></li><li><p>softmax function - applied to the output of the CNN</p></li></ul><p></p>

New cards

what is the RELU function

a type of activation function that we apply to our feature map before it gets pooled (essentially sets any negative values to 0)
most common function used for deep learning

<ul><li><p>a type of activation function that we apply to our feature map before it gets pooled (essentially sets any negative values to 0)</p></li><li><p>most common function used for deep learning</p></li></ul><p></p>

New cards

what is the SoftMax function

a nonlinear, unbounded function that maps ‘real-valued’ inputs into a vector of probabilities, where the vector sums to 1

New cards

how is the softmax function used

used as the activation function of the last layer of the neural network

the input vector that has the highest probability is the decision made e.g. above, node with 0.9 has the highest activation so that is the most likely classification
that output node will fire, remaining nodes will not fire

New cards

what are examples of CNN architectures (aka. implementations

LENET architecture
ALEXNET architecture

takes the principles of CNN and implements them in different ways, e.g. different number of layers and filters, unstacked vs stacked convolutional layers, different activation functions used

<ul><li><p>LENET architecture</p></li><li><p>ALEXNET architecture</p></li></ul><p></p><p>takes the principles of CNN and implements them in different ways, e.g. different number of layers and filters, unstacked vs stacked convolutional layers, different activation functions used</p>

New cards

what is transfer learning

a ML technique where a model developed for a particular tasks is reused as the starting point for another model doing another task

New cards

what are the advantages of deep learning compared to classical ML

can outperform ML algorithms on most tasks
can learn directly from raw data without needing manual feature extraction which is good for image and speech recognition
same NN architectures can be used for different tasks with only minor modifications
DL models scale well with more data, computational inference time stays same but performance usually increases
transfer learning by fine-tuning is easy and common

New cards

what are the disadvantages of deep learning compared to classical ML

small datasets prone to overfitting so requires large labelled datasets to train prevent this
resource intensive and long time to train especially with large datasets (hours to weeks)
black box - meaning difficult to understand the reasoning behind it’s predictions
hyperparameters can significantly affect performance e.g. num layers, type of layers, learning rate, batch size
vulnerable to adversarial attacks where an unnoticed change in the input an cause the model to make incorrect predictions

New cards

why is it important to monitor the loss of a NN

monitoring the loss helps us monitor the performance of the NN

New cards

what is a validation dataset

The sample of data used to provide an unbiased evaluation of a model fit on the training dataset while fine-tuning model hyperparameters.

New cards

how does the validation set differ from training and testing set

different from training as training actually makes the NN learn the patterns, but val just improves upon already learnt parameters
different from the test set as val is used for fine tuning and impacts the model, but the test set is only used for evaluation and does not change the model

New cards

why could a neural network model not perform well

not enough training data
not trained for long enough - underfitting
trained for too long - overfitting
testing data distribution is different from training data

New cards

what programming libraries can be used to create a NN

tensorflow
keras
pytorch