Deep Learning - Computer Vision

0.0(0)
studied byStudied by 1 person
0.0(0)
full-widthCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/50

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

51 Terms

1
New cards

what is deep learning

a subset of machine learning that uses artificial multi-layered neural networks to learn and simulate the complex decision making power of the human brain

2
New cards

what part of ML is DL part of

mainly supervised learning, but has techniques that can be used in unsupervised and reinforcement learning

3
New cards

what is compter vision

a field in AI using DL to teach computers and systems to derive important information from images, videos, and other visual inputs, in order to make recommendations or take certain actions

4
New cards

what can computer vision be used for

  • image classification

  • object detection

  • face recognition

  • image synthesis

  • OCR - convert images to machine readable text

  • enhance image resolution

5
New cards

what is natural language processing

subfield of AI using ML and DL that enables computers to understand and communicate with human language e.g. recognise, generate, and understand text

6
New cards

what can natural language processing be used for

  • translate text

  • generate text

  • sentiment analysis

  • summarise text

  • converse with humans

7
New cards

what domains can deep learning be used on

  • self driving cars

  • medical imaging

  • robot navigation

8
New cards

what factors allowed deep learning to grow in popularity

  • data availability: better training datasets

  • GPU power: increased computation to train with large datasets and complex models

  • bigger NN models: can learn better and more complex patterns

  • hardware-constrained optimisation - using the biggest model that fits in the hardware we have

  • moore’s law: exponential growth of computing power over time (idea that the number of transistors on a single chip doubles every two years)

9
New cards

difference between an neural network and a perceptron

  • perceptron is a single layer neural network for linear binary classification while neural network is multi-layered (contains many hidden layers) and can perform more complex tasks with unsupervised learning

10
New cards

what is the network architecture of a NN

  • input layer

  • many hidden layers

  • output layers where each node in the output layer corresponds to a label in the dataset

11
New cards

how do computers view images

each image is made up of pixels with different values depending on the colour of the pixel.

<p>each image is made up of pixels with different values depending on the colour of the pixel.</p>
12
New cards

what are the steps to train a NN for image classification (basic feed forward network)

  1. weights and bias initialised to random values

  2. each image in the dataset is given as an input to the NN

  3. convert/flatten each image

    1. aka. grid of pixels as a 2D array, into a vector (1D array) of input nodes. e.g. 3×3 image has 9 input nodes, each node in the FIRST hidden layer will have 9 connections - fully connected to each input node

  4. perform a forward pass - activation functions and neuron firing

  5. obtain output as a vector of values (predicted value) for each image

  6. compute the loss, e.g. using mean squared error

  7. after a batch, perform Backpropagation to compute the gradient of the loss for each weight

  8. update the weights by using an optimisation algorithm based on the gradients

  9. repeat forward pass, computer loss, backward pass, weight update for each batch in training data until full epoch has completed

  10. repeat for as many epochs as defined.

<ol><li><p>weights and bias initialised to random values </p></li><li><p>each image in the dataset is given as an input to the NN </p></li><li><p>convert/flatten each image </p><ol><li><p>aka. grid of pixels as a 2D array, into a vector (1D array) of input nodes. e.g. 3×3 image has 9 input nodes, each node in the FIRST hidden layer will have 9 connections - fully connected to each input node </p></li></ol></li><li><p><strong>perform a forward pass</strong> - activation functions and neuron firing </p></li><li><p><strong>obtain output </strong>as a vector of values (predicted value) for each image </p></li><li><p><strong>compute the loss, </strong>e.g. using mean squared error </p></li><li><p>after a batch, perform Backpropagation to compute the gradient of the loss for each weight </p></li><li><p><strong>update the weights</strong> by using an optimisation algorithm based on the gradients </p><p></p></li><li><p>repeat forward pass, computer loss, backward pass, weight update for each batch in training data until full epoch has completed </p></li><li><p>repeat for as many epochs as defined.</p></li></ol><p></p>
13
New cards

what optimisation algorithms can we use to fine-tune the weights

  1. Gradient Descent (Batch Gradient Descent)

  2. Stochastic Gradient Descent (SGD)

  3. Mini-Batch Gradient Descent

  4. Momentum

  5. Adagrad (Adaptive Gradient Algorithm)

  6. RMSprop (Root Mean Square Propagation)

  7. Adam (Adaptive Moment Estimation)

  8. Adadelta

  9. Nadam (Nesterov-accelerated Adaptive Moment Estimation)

  10. L-BFGS (Limited-memory Broyden-FletcherGoldfarb-Shanno)

14
New cards

what is the point of gradient descent

  • lower the loss function for a machine learning problem relatively fast and reliably - not getting stuck at local minima, saddle points, or plateaus but reaching the global minima

  • optimise the values of all thetas (weights) such that the final combination of weights reduces the loss function to it’s global minima

<ul><li><p>lower the loss function for a machine learning problem relatively fast and reliably - not getting stuck at local minima, saddle points, or plateaus but reaching the global minima</p></li><li><p>optimise the values of all thetas (weights) such that the final combination of weights reduces the loss function to it’s global minima</p></li></ul><p></p>
15
New cards

what is an epoch

1 complete pass through the training data

16
New cards

what is a batch

a subset of the training data, as a hyperparameter defines number of samples to go through before processing instead of processing each data point individually to improve efficiency

  • e.g. group images and compute mean error, and backpropagate the mean error to update the weights

  • common batch size is 4 - 512 depending on size of images and memory

17
New cards

how can neural networks be differentiated

shallow networks

deep networks

18
New cards

what is a shallow neural network

  • have 1 or none hidden layers

  • faster to train and less prone to overfitting due to smaller size

  • have limited capability to extract complex features

<ul><li><p>have 1 or none hidden layers</p></li><li><p>faster to train and less prone to overfitting due to smaller size</p></li><li><p>have limited capability to extract complex features</p></li></ul><p></p>
19
New cards

what is a deep neural network

  • had many hidden layers

  • more complex to train and prone to overfitting due to larger size

  • can extract the hierarchy of features

<ul><li><p>had many hidden layers</p></li><li><p>more complex to train and prone to overfitting due to larger size</p></li><li><p>can extract the hierarchy of features</p></li></ul><p></p>
20
New cards

what is the difference between deep learning and machine learning

  • machine learning requires manual feature extraction and selection, and uses a classifier (machine learning model) with a shallow structure

  • neural networks have feature learning combined with the classifier

<ul><li><p>machine learning requires manual feature extraction and selection, and uses a classifier (machine learning model) with a shallow structure</p></li><li><p>neural networks have feature learning combined with the classifier</p></li></ul><p></p>
21
New cards

what are some different types of neural networks

  • recurrent neural networks RNN

  • deep convolutional networks DCN / CCN

  • long short term memory LSTM

22
New cards

what is a convolutional neural network

a type of neural network that uses three-dimensional data for image classification and object recognition tasks.

23
New cards

why do we need convolutional neural networks

better for large and complex images

  • classifying images using a basic feed-forward neural network doesn’t scale well with larger images, since as the number of hidden layers increases, the number of nodes and connections increases, with weights that need to be assigned to each connection - computationally expensive

    • here, need to estimate 10k weights per node in the hidden layer

  • not clear if the NN will perform well if the image is shifted by one pixel

  • doesn’t take into consideration that there may be a correlation between pixels in an image, e.g. a picture of Pongu, blue pixels likely to be near other blue pixels

<p><strong>better for large and complex images </strong></p><ul><li><p>classifying images using a basic feed-forward neural network doesn’t scale well with larger images, since as the number of hidden layers increases, the number of nodes and connections increases, with weights that need to be assigned to each connection - computationally expensive</p><ul><li><p>here, need to estimate 10k weights per node in the hidden layer</p></li></ul></li><li><p>not clear if the NN will perform well if the image is shifted by one pixel</p></li><li><p>doesn’t take into consideration that there may be a correlation between pixels in an image, e.g. a picture of Pongu, blue pixels likely to be near other blue pixels</p></li></ul><p></p>
24
New cards

what do convolutional neural networks do differently to make image classification practical

  1. reduce number of input nodes (pooling)

  2. tolerate small shifts in where the pixels are in the image

  3. takes advantage of the correlation between pixels in complex images (using the filter to look at a region of pixels at a time)

25
New cards

what are the layers/ operations in a CNN

convolution

pooling

dropout

activation function: RELU

<p>convolution</p><p>pooling</p><p>dropout</p><p>activation function: RELU</p>
26
New cards

what is the convolution operation

applies a filter / kernel to the input image in produce a feature map

27
New cards

what is a filter / kernel

  • a smaller square (usually 3x3 pixels), that contains a value for each pixel - the intensity of each pixel is randomised at first and then determined by backpropagation

  • looks at a group of pixels at a time and can detect local features like edges, textures, and patterns in an image

<ul><li><p>a smaller square (usually 3x3 pixels), that contains a value for each pixel - the intensity of each pixel is randomised at first and then determined by backpropagation</p></li><li><p>looks at a group of pixels at a time and can detect local features like edges, textures, and patterns in an image</p></li></ul><p></p>
28
New cards

how does the convolution operation work

  • overlay the filter onto the input image and calculate the dot product for that section

    • dot product = multiply each overlapping pixel together and then sum the products (and then add bias after)

  • the dot product is placed on the feature map

  • shift the filter on the image and repeat the process until the whole feature map has been created (overlap allowed)

<ul><li><p>overlay the filter onto the input image and calculate the dot product for that section</p><ul><li><p>dot product = multiply each overlapping pixel together and then sum the products (and then add bias after)</p></li></ul></li><li><p>the dot product is placed on the feature map</p></li><li><p>shift the filter on the image and repeat the process until the whole feature map has been created (overlap allowed)</p></li></ul><p></p>
29
New cards

what values do we need to determine to do the convolution operator

  • filter size - usually 3x3 means 9 weight parameters and 1 bias parameter

  • step size - by how many pixels should the filter shift at each iteration, e.g. shift by 1 pixel to the right, shift by 4 pixels to the right

30
New cards

what do we do after the convolution operator

run the feature map through the RELU activation function before passing it through the pooling operator

  • done to introduce non-linearity so that the NN can learn complex patterns (e.g. learning textures and abstract features instead of just edges)

<p>run the feature map through the RELU activation function before passing it through the pooling operator </p><ul><li><p>done to introduce non-linearity so that the NN can learn complex patterns (e.g. learning textures and abstract features instead of just edges)</p></li></ul><p></p>
31
New cards

what is the pooling operator

  • a way of sub-sampling, aka. reducing the dimension (width and height) of the input

  • reduce the resolution of the feature map while still retaining the important features for classification

32
New cards

how does pooling work

  • another filter / kernel is applied to the feature map

  • apply a pooling operator like Max or Average to the overlap and get a value

  • the filter moves so that it does not overlap, and covers a new area of the feature map

  • the process is repeated until the whole feature map has been covered and the Pooled Layer output has been produced

<ul><li><p>another filter / kernel is applied to the feature map</p></li><li><p>apply a pooling operator like Max or Average to the overlap and get a value</p></li><li><p>the filter moves so that it does not overlap, and covers a new area of the feature map</p></li><li><p>the process is repeated until the whole feature map has been covered and the Pooled Layer output has been produced</p></li></ul><p></p>
33
New cards

what values need to be defined for the pooling operator

  • pooling size - similar to kernel size

  • step

  • type of operator used, e.g. max pooling or average pooling

34
New cards

what is max pooling

take the highest value from the area covered by the kernel

<p>take the highest value from the area covered by the kernel</p>
35
New cards

what is average pooling

calculates the average value form the area covered by the kernel

<p>calculates the average value form the area covered by the kernel</p>
36
New cards

how do we convert the Pooled Layer into a neural network

turn the pooled layer into a vector / column of input nodes and use them as the input nodes for the NN

37
New cards

what is the dropout layer

a regularisation technique where nodes are dropped, meaning the activation for that node is considered 0

  • helps the network generalise better and prevents overfitting

38
New cards

how does dropout occur

dropout rate p

  • during the training stage, a percentage of nodes are randomly set to 0

  • during the testing stage, dropout is not enabled, but remaining nodes are scaled up by a factor of (1 - dropout rate) to compensate for the amount of dropped neurons during training, and to make the output more consistent between training and testing (even though all nodes used during testing)

<p>dropout rate <em>p</em></p><ul><li><p>during the training stage, a percentage of nodes are randomly set to 0</p></li><li><p>during the testing stage, dropout is not enabled, but remaining nodes are scaled up by a factor of (1 - dropout rate) to compensate for the amount of dropped neurons during training, and to make the output more consistent between training and testing (even though all nodes used during testing) </p></li></ul><p></p>
39
New cards

what are the different types of activation functions for convolutional neural networks

  • TanH

  • ReLU function - applied to the feature map before pooling

  • softmax function - applied to the output of the CNN

<ul><li><p>TanH</p></li><li><p>ReLU function - applied to the feature map before pooling</p></li><li><p>softmax function - applied to the output of the CNN</p></li></ul><p></p>
40
New cards

what is the RELU function

  • a type of activation function that we apply to our feature map before it gets pooled (essentially sets any negative values to 0)

  • most common function used for deep learning

<ul><li><p>a type of activation function that we apply to our feature map before it gets pooled (essentially sets any negative values to 0)</p></li><li><p>most common function used for deep learning</p></li></ul><p></p>
41
New cards

what is the SoftMax function

a nonlinear, unbounded function that maps ‘real-valued’ inputs into a vector of probabilities, where the vector sums to 1

<p>a nonlinear, unbounded function that maps ‘real-valued’ inputs into a vector of probabilities, where the vector sums to 1</p>
42
New cards

how is the softmax function used

used as the activation function of the last layer of the neural network

  • the input vector that has the highest probability is the decision made e.g. above, node with 0.9 has the highest activation so that is the most likely classification

  • that output node will fire, remaining nodes will not fire

<p>used as the activation function of the last layer of the neural network</p><ul><li><p>the input vector that has the highest probability is the decision made e.g. above, node with 0.9 has the highest activation so that is the most likely classification </p></li><li><p>that output node will fire, remaining nodes will not fire</p></li></ul><p></p>
43
New cards

what are examples of CNN architectures (aka. implementations

  • LENET architecture

  • ALEXNET architecture

takes the principles of CNN and implements them in different ways, e.g. different number of layers and filters, unstacked vs stacked convolutional layers, different activation functions used

<ul><li><p>LENET architecture</p></li><li><p>ALEXNET architecture</p></li></ul><p></p><p>takes the principles of CNN and implements them in different ways, e.g. different number of layers and filters, unstacked vs stacked convolutional layers, different activation functions used</p>
44
New cards

what is transfer learning

a ML technique where a model developed for a particular tasks is reused as the starting point for another model doing another task

45
New cards

what are the advantages of deep learning compared to classical ML

  • can outperform ML algorithms on most tasks

  • can learn directly from raw data without needing manual feature extraction which is good for image and speech recognition

  • same NN architectures can be used for different tasks with only minor modifications

  • DL models scale well with more data, computational inference time stays same but performance usually increases

  • transfer learning by fine-tuning is easy and common

46
New cards

what are the disadvantages of deep learning compared to classical ML

  • small datasets prone to overfitting so requires large labelled datasets to train prevent this

  • resource intensive and long time to train especially with large datasets (hours to weeks)

  • black box - meaning difficult to understand the reasoning behind it’s predictions

  • hyperparameters can significantly affect performance e.g. num layers, type of layers, learning rate, batch size

  • vulnerable to adversarial attacks where an unnoticed change in the input an cause the model to make incorrect predictions

47
New cards

why is it important to monitor the loss of a NN

monitoring the loss helps us monitor the performance of the NN

<p>monitoring the loss helps us monitor the performance of the NN</p>
48
New cards

what is a validation dataset

The sample of data used to provide an unbiased evaluation of a model fit on the training dataset while fine-tuning model hyperparameters.

49
New cards

how does the validation set differ from training and testing set

  • different from training as training actually makes the NN learn the patterns, but val just improves upon already learnt parameters

  • different from the test set as val is used for fine tuning and impacts the model, but the test set is only used for evaluation and does not change the model

50
New cards

why could a neural network model not perform well

  • not enough training data

  • not trained for long enough - underfitting

  • trained for too long - overfitting

  • testing data distribution is different from training data

51
New cards

what programming libraries can be used to create a NN

  • tensorflow

  • keras

  • pytorch