Intro to AI Ch. 7 - Computer Vision & Deep Learning

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/28

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

29 Terms

New cards

Computer Vision

The design of computer systems that possess the ability to capture, understand, and interpret important visual information contained with image and video data.

Best approach: neural networks and deep learning

New cards

Image representation

Features of an image are broken down into a set of pixel densities

New cards

Traditional vs. New Machine Learning (image recognition)

Traditional: Input image data→feature extractor→features→machine learning algorithm→output classification

Deep learning: Input image data→deep learning algorithm→output classification

New cards

Deep learning

A subfield of machine learning inspired by how the brain is structured and operates

Term became popular in the mid-2000’s

“Deep” refers to the number of hidden layers in the neural network

New cards

Neural Networks

A model of reasoning based on the brain - nonlinear and parallel information processing system

New cards

Neuron

Basic information processing unit

Soma - Axon - Dendrites

The brain has nearly 100 billion neurons and 60 trillion connections (synapses)

New cards

Artificial Neural Networks (ANN)

Consists of a number of very simple processors (neurons/perceptrons) connected by weighted links passing signals from one to another.

Weights can either be excitatory (positive value, increases probability of a neuron firing) or inhibitory (negative value, decreases probability of a neuron firing)

New cards

McCulloch & Pitts (1943)

Defined the first model of a perceptron

New cards

Activation

The weighted sum of inputs to a perceptron

New cards

Activation Function

A function that calculates the output of a perceptron based the weighted sum of its inputs. Note all activation functions are non-linear, to allow approximation of complex functions.

Examples: step, sign, sigmoid, ReLU, softmax

Also called a squashing function, if mapped to range of [0,1] or [-1,1]

New cards

Rectified Linear Unit (ReLU)

The most popular modern choice of activation function. Linear for all positive values, zero for all negative values. Quick to compute

y = max(0,x)

Good for CNNs as only nodes with positive activation are used. Reduces amount of processing necessary and reduces noise in the network

New cards

Bias

An additional learnable parameter for each neuron, shifts the activation function up or down (i.e. shifts threshold for neuron firing)

New cards

Backpropagation

Algorithm by which weights can be adjusted, and thus the model can learn

Forward pass

Inputs X into input layer
Input modeled using current weights W
Feed output through each hidden layer to the output layer

Backward pass

Calculate the error in the outputs: Error = Output - Target
Travel back from the output layer to the hidden layer to adjust the weights such that error is decreased.

New cards

Fully connected neural network

All the neurons have connections from layer to layer

The most basic and very popular type of network

New cards

Recurrent Neural Network (RNN)

Work with sequence prediction problems

Processes prior inputs across time, in addition to current input

Use for: text, speech, classification prediction, regression prediction

Don’t use for: tabular data, image data

New cards

Convolutional Neural Network (CNN)

Map image data to an output variable

Ability to develop an internal representation of a two-dimensional image

Good for handwriting recognition and natural language processing

Use for: image data, classification prediction, regression prediction

Architecture: Input image→convolution→pooling→flattening

New cards

Generative Adversarial Network (GAN)

Two models competing in a tight feedback loop.

A “generator” NN creates a myriad of new creations. A “discriminator” NN chooses which are real. The generator changes the creations to be as realistic as possible. After many iterations, the discriminator will no longer by necessary.

Invented in 2014 by Ian Goodfellow in a pub

New cards

Deep Learning hardware

GPUs most common as they are good for parallel processing

New AI-specific chips:

Google: tensor processing unit (TPU) (2016)
Amazon: AWS Inferentia
Facebook and Intel: Joint AI chip
Intel: Nervana Neural Network Processor
Tesla: can processor 36 trillion operations per second

New cards

Deep Learning Applications

Domains where there are a large number of input features and where there are large datasets available

Medical: detecting Alzheimer’s and other diseases, improve accuracy of MRI and PET scans

Speech and Text-to-Speech Generation: Digital assistants, handwriting transcription

Computer Vision: Face recognition, image classification, activity recognition, self-driving cars

New cards

Fashion MNIST Dataset

70k images and 10 categories of clothing

28×28 pixel images. Each pixel is on a scale between 0 (lightest) and 1 (darkest)

New cards

Spatial Integrity

How pixels combine with one another to create features.

Many ANNs can only work with images if they are converted to a 1D line. CNNs maintain spatial integrity - data can be input as a 2D grid, and can even handle colors as 3 grids for RGB

New cards

Kernel

A filter

New cards

CNN Feature Extraction

First layers learn basic feature detection filters: edge, corner, etc.

Middle layers learn filters to detect parts of objects: eye, nose, etc.

Last layers learn filters for full objects in different shapes and positions

In a CNN, a convolution is performed on input data with a filter to produce a feature map

New cards

Convolutional filter

A set of weights that are applied to pixel values in the input image. Weights are learned through backpropagation in the training phase

Examples:

Vertical edge detection: [-1 0 1; -2 0 2; -1 0 1]
Horizontal edge detection: [-1 -2 -1; 0 0 0; 1 2 1]

New cards

Feature map

Shows the result of applying filters to an input image. Usually want feature map that is the same size as the original image

Also known as an Activation Map

New cards

Stride

How many pixels the filter moves each time it processes a group of pixels. Longer strides result in smaller feature maps but can potentially miss important features

New cards

Padding

“Extra space” around an image being processes that allows pixels on the edge of the image to by fully processed by the filter

If stride S=1 and filter of size FxF, then padding size P=(F-1)/2

New cards

Pooling

Takes place after features maps are passed through ReLU activation function

Goal is to reduce feature map size without losing information (dimensionality reduction)

Variants:

Max pooling: Takes the maximum pixel value within the filter (efficient at maintaining edges)
Average pooling: Takes the average pixel value within the filter
Sum pooling: Sums the pixel values within the filter

New cards

Flattening

Flatten a pooled feature map into a column vector. This vector is passed through an ANN for further processing

[1 2 3; 4 5 6; 7 8 9;] → [1 2 3 4 5 6 7 8 9]^T