07 Deep Learning
Deep Learning Overview
Course: MACHINE LEARNING COMP.5450
Instructor: Dr. Ruizhe Ma
ArgMax and SoftMax
ArgMax: Identifies the largest value from the output layer.
Example: If output values are [0.6, 0.1, 0.2], ArgMax returns 0.6.
Commonly used for testing.
SoftMax: Converts a vector of K real numbers into a probability distribution.
Ensures all probabilities sum to 1.
Commonly used for training.
Formula:[ P(y_i) = \frac{e^{z_i}}{\sum_{j=1}^{K}e^{z_j}} ]
Where ( z_i ) represents the output for class i.
Deep Architectures
More layers imply deeper learning.
Definition: Composed of multiple non-linear operations, generally with more than 1 hidden layer qualifies as deep learning.
Vanishing Gradient Problem
Larger gradients lead to significant adjustments to weights in feedforward networks.
During backpropagation, gradients can diminish layer by layer such that earlier layers may not learn effectively.
The issue arises as gradients propagate back through many layers, becoming smaller and potentially reaching zero (vanishing gradient).
Deep Architecture in the Brain
Area V1 detects edges (1st layer).
Area V2 recognizes primitive shapes (2nd layer).
Area V4 identifies higher-level abstractions and objects (3rd layer).
Mimics deep learning neural networks in processing visual information akin to how biological brains do.
Theoretical Advantages of Deep Architectures
Dynamic Standardization: The standards of neural networks are mostly based on trial & error, allowing flexibility.
Function Representation: Some complex functions are not efficiently represented by shallow architectures.
More formally, depth k architectures can compactly represent certain functions compared to depth (k-1).
Computational consequences: Reduced need for multiple elements in the layers.
Statistical implications: Insufficient depth can lead to poor generalization.
Auto Encoder
Structure: Similar input and output layers, bottleneck hidden layer with fewer nodes.
Purpose: Compress data while maintaining quality; learns without supervision.
Components:
Encoding function to compress data.
Decoding function for reconstruction.
Measuring reconstruction loss to assess decoder performance.
Main usages: Data denoising and dimensionality reduction.
Recurrent Neural Networks (RNNs)
Designed for temporal data dependencies (e.g., time series, language processing).
Contextual importance: Past decisions impact current outcomes.
Applications include predictions, text summarization, speech recognition, etc.
Problems with RNNs
Complex Training: Difficulty in remembering information long-term.
Exploding Gradient Problem: Occurs when large error gradients accumulate.
Vanishing Gradient Problem: The derivative of loss approaches zero, hindering learning in deeper layers.
Activation Functions: Sigmoid and Tanh
Sigmoid: Output ranges from (0, 1) useful for binary classification.
Tanh: Shifts sigmoid outputs to (-1, 1)
Both can lead to problems like saturation in deep networks.
Memory in Neural Networks
RNNs often struggle with short-term memory.
LSTMs and GRUs: Address memory issues through memory cells that manage long-term dependencies by controlling what information to remember or forget.
Long Short Term Memory (LSTM)
Enhancements include long-term and short-term states.
LSTM helps in processing data with temporal dependencies effectively.
Gated Recurrent Units (GRUs)
Simplified version of LSTMs that combines forget and input gates into a single update gate.
GRUs are more parameter-efficient and faster to train compared to LSTMs.
Image Recognition
Challenges in image classification, like detecting and categorizing various objects accurately.
Convolutional Neural Networks (CNN)
CNNs manage spatial hierarchies in images, with convolutional and pooling layers crucial for feature extraction.
Pooling: Reduces spatial size, helpful in decreasing computational load.
Types: Max Pooling (extracts max values) and Average Pooling (extracts average values).
Effective in managing high-dimensional inputs and improves model performance by avoiding overfitting.
Applications of CNNs
Case studies like Fashion-MNIST, automatic image colorization, and caption generation for visual data.
Neural Network Limitations
Vulnerability to adversarial attacks (e.g., image alterations leading to misclassification).
Generalization issues faced by DNNs in image perception and recognition.
Conclusion
Understanding the technological framework behind deep learning and its neural networks is essential for applications in various fields like computer vision, natural language processing, and more.