1/28
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Computer Vision
The design of computer systems that possess the ability to capture, understand, and interpret important visual information contained with image and video data.
Best approach: neural networks and deep learning
Image representation
Features of an image are broken down into a set of pixel densities
Traditional vs. New Machine Learning (image recognition)
Traditional: Input image data→feature extractor→features→machine learning algorithm→output classification
Deep learning: Input image data→deep learning algorithm→output classification
Deep learning
A subfield of machine learning inspired by how the brain is structured and operates
Term became popular in the mid-2000’s
“Deep” refers to the number of hidden layers in the neural network
Neural Networks
A model of reasoning based on the brain - nonlinear and parallel information processing system
Neuron
Basic information processing unit
Soma - Axon - Dendrites
The brain has nearly 100 billion neurons and 60 trillion connections (synapses)
Artificial Neural Networks (ANN)
Consists of a number of very simple processors (neurons/perceptrons) connected by weighted links passing signals from one to another.
Weights can either be excitatory (positive value, increases probability of a neuron firing) or inhibitory (negative value, decreases probability of a neuron firing)
McCulloch & Pitts (1943)
Defined the first model of a perceptron
Activation
The weighted sum of inputs to a perceptron
Activation Function
A function that calculates the output of a perceptron based the weighted sum of its inputs. Note all activation functions are non-linear, to allow approximation of complex functions.
Examples: step, sign, sigmoid, ReLU, softmax
Also called a squashing function, if mapped to range of [0,1] or [-1,1]
Rectified Linear Unit (ReLU)
The most popular modern choice of activation function. Linear for all positive values, zero for all negative values. Quick to compute
y = max(0,x)
Good for CNNs as only nodes with positive activation are used. Reduces amount of processing necessary and reduces noise in the network
Bias
An additional learnable parameter for each neuron, shifts the activation function up or down (i.e. shifts threshold for neuron firing)
Backpropagation
Algorithm by which weights can be adjusted, and thus the model can learn
Forward pass
Inputs X into input layer
Input modeled using current weights W
Feed output through each hidden layer to the output layer
Backward pass
Calculate the error in the outputs: Error = Output - Target
Travel back from the output layer to the hidden layer to adjust the weights such that error is decreased.
Fully connected neural network
All the neurons have connections from layer to layer
The most basic and very popular type of network
Recurrent Neural Network (RNN)
Work with sequence prediction problems
Processes prior inputs across time, in addition to current input
Use for: text, speech, classification prediction, regression prediction
Don’t use for: tabular data, image data
Convolutional Neural Network (CNN)
Map image data to an output variable
Ability to develop an internal representation of a two-dimensional image
Good for handwriting recognition and natural language processing
Use for: image data, classification prediction, regression prediction
Architecture: Input image→convolution→pooling→flattening
Generative Adversarial Network (GAN)
Two models competing in a tight feedback loop.
A “generator” NN creates a myriad of new creations. A “discriminator” NN chooses which are real. The generator changes the creations to be as realistic as possible. After many iterations, the discriminator will no longer by necessary.
Invented in 2014 by Ian Goodfellow in a pub
Deep Learning hardware
GPUs most common as they are good for parallel processing
New AI-specific chips:
Google: tensor processing unit (TPU) (2016)
Amazon: AWS Inferentia
Facebook and Intel: Joint AI chip
Intel: Nervana Neural Network Processor
Tesla: can processor 36 trillion operations per second
Deep Learning Applications
Domains where there are a large number of input features and where there are large datasets available
Medical: detecting Alzheimer’s and other diseases, improve accuracy of MRI and PET scans
Speech and Text-to-Speech Generation: Digital assistants, handwriting transcription
Computer Vision: Face recognition, image classification, activity recognition, self-driving cars
Fashion MNIST Dataset
70k images and 10 categories of clothing
28×28 pixel images. Each pixel is on a scale between 0 (lightest) and 1 (darkest)
Spatial Integrity
How pixels combine with one another to create features.
Many ANNs can only work with images if they are converted to a 1D line. CNNs maintain spatial integrity - data can be input as a 2D grid, and can even handle colors as 3 grids for RGB
Kernel
A filter
CNN Feature Extraction
First layers learn basic feature detection filters: edge, corner, etc.
Middle layers learn filters to detect parts of objects: eye, nose, etc.
Last layers learn filters for full objects in different shapes and positions
In a CNN, a convolution is performed on input data with a filter to produce a feature map
Convolutional filter
A set of weights that are applied to pixel values in the input image. Weights are learned through backpropagation in the training phase
Examples:
Vertical edge detection: [-1 0 1; -2 0 2; -1 0 1]
Horizontal edge detection: [-1 -2 -1; 0 0 0; 1 2 1]
Feature map
Shows the result of applying filters to an input image. Usually want feature map that is the same size as the original image
Also known as an Activation Map
Stride
How many pixels the filter moves each time it processes a group of pixels. Longer strides result in smaller feature maps but can potentially miss important features
Padding
“Extra space” around an image being processes that allows pixels on the edge of the image to by fully processed by the filter
If stride S=1 and filter of size FxF, then padding size P=(F-1)/2
Pooling
Takes place after features maps are passed through ReLU activation function
Goal is to reduce feature map size without losing information (dimensionality reduction)
Variants:
Max pooling: Takes the maximum pixel value within the filter (efficient at maintaining edges)
Average pooling: Takes the average pixel value within the filter
Sum pooling: Sums the pixel values within the filter
Flattening
Flatten a pooled feature map into a column vector. This vector is passed through an ANN for further processing
[1 2 3; 4 5 6; 7 8 9;] → [1 2 3 4 5 6 7 8 9]T