07 Deep Learning

Deep Learning Overview

  • Course: MACHINE LEARNING COMP.5450

  • Instructor: Dr. Ruizhe Ma

ArgMax and SoftMax

  • ArgMax: Identifies the largest value from the output layer.

    • Example: If output values are [0.6, 0.1, 0.2], ArgMax returns 0.6.

    • Commonly used for testing.

  • SoftMax: Converts a vector of K real numbers into a probability distribution.

    • Ensures all probabilities sum to 1.

    • Commonly used for training.

    • Formula:[ P(y_i) = \frac{e^{z_i}}{\sum_{j=1}^{K}e^{z_j}} ]

    • Where ( z_i ) represents the output for class i.

Deep Architectures

  • More layers imply deeper learning.

  • Definition: Composed of multiple non-linear operations, generally with more than 1 hidden layer qualifies as deep learning.

Vanishing Gradient Problem

  • Larger gradients lead to significant adjustments to weights in feedforward networks.

  • During backpropagation, gradients can diminish layer by layer such that earlier layers may not learn effectively.

  • The issue arises as gradients propagate back through many layers, becoming smaller and potentially reaching zero (vanishing gradient).

Deep Architecture in the Brain

  • Area V1 detects edges (1st layer).

  • Area V2 recognizes primitive shapes (2nd layer).

  • Area V4 identifies higher-level abstractions and objects (3rd layer).

  • Mimics deep learning neural networks in processing visual information akin to how biological brains do.

Theoretical Advantages of Deep Architectures

  • Dynamic Standardization: The standards of neural networks are mostly based on trial & error, allowing flexibility.

  • Function Representation: Some complex functions are not efficiently represented by shallow architectures.

  • More formally, depth k architectures can compactly represent certain functions compared to depth (k-1).

    • Computational consequences: Reduced need for multiple elements in the layers.

    • Statistical implications: Insufficient depth can lead to poor generalization.

Auto Encoder

  • Structure: Similar input and output layers, bottleneck hidden layer with fewer nodes.

  • Purpose: Compress data while maintaining quality; learns without supervision.

  • Components:

    • Encoding function to compress data.

    • Decoding function for reconstruction.

    • Measuring reconstruction loss to assess decoder performance.

  • Main usages: Data denoising and dimensionality reduction.

Recurrent Neural Networks (RNNs)

  • Designed for temporal data dependencies (e.g., time series, language processing).

  • Contextual importance: Past decisions impact current outcomes.

  • Applications include predictions, text summarization, speech recognition, etc.

Problems with RNNs

  1. Complex Training: Difficulty in remembering information long-term.

  2. Exploding Gradient Problem: Occurs when large error gradients accumulate.

  3. Vanishing Gradient Problem: The derivative of loss approaches zero, hindering learning in deeper layers.

Activation Functions: Sigmoid and Tanh

  • Sigmoid: Output ranges from (0, 1) useful for binary classification.

  • Tanh: Shifts sigmoid outputs to (-1, 1)

  • Both can lead to problems like saturation in deep networks.

Memory in Neural Networks

  • RNNs often struggle with short-term memory.

  • LSTMs and GRUs: Address memory issues through memory cells that manage long-term dependencies by controlling what information to remember or forget.

Long Short Term Memory (LSTM)

  • Enhancements include long-term and short-term states.

  • LSTM helps in processing data with temporal dependencies effectively.

Gated Recurrent Units (GRUs)

  • Simplified version of LSTMs that combines forget and input gates into a single update gate.

  • GRUs are more parameter-efficient and faster to train compared to LSTMs.

Image Recognition

  • Challenges in image classification, like detecting and categorizing various objects accurately.

Convolutional Neural Networks (CNN)

  • CNNs manage spatial hierarchies in images, with convolutional and pooling layers crucial for feature extraction.

  • Pooling: Reduces spatial size, helpful in decreasing computational load.

    • Types: Max Pooling (extracts max values) and Average Pooling (extracts average values).

  • Effective in managing high-dimensional inputs and improves model performance by avoiding overfitting.

Applications of CNNs

  • Case studies like Fashion-MNIST, automatic image colorization, and caption generation for visual data.

Neural Network Limitations

  • Vulnerability to adversarial attacks (e.g., image alterations leading to misclassification).

  • Generalization issues faced by DNNs in image perception and recognition.

Conclusion

  • Understanding the technological framework behind deep learning and its neural networks is essential for applications in various fields like computer vision, natural language processing, and more.