Introduction to Deep Learning

Introduction to Deep Learning

  • Course Name: ADM3308

  • Institution: Telfer School of Management, University of Ottawa

Course Outline

  • Introduction

  • Convolutional Neural Networks (CNN)

  • Autoencoder

  • Recurrent Neural Networks (RNN)

  • Long Short Term Memory (LSTM)

  • Appendix: Deep learning software

Introduction

  • In 2006, AI researcher Geoffrey Hinton published a pivotal paper that raised awareness of deep learning.

  • Deep Learning Overview:

    • Involves complex networks with multiple layers.

    • Creates flexible models that uncover buried information in vast datasets more efficiently than traditional machine learning techniques that rely on hand-crafted features.

Classical Machine Learning vs. Deep Learning

  • Classical machine learning techniques:

    • Make predictions directly from a predetermined set of features specified by the user.

  • Deep learning techniques:

    • Use multiple transformation steps to construct complex features.

Analyzing Massive Low-level Data

  • Classical statistical and machine learning models, including neural networks, utilize available informative predictors (e.g., purchase data, bank account details, etc.).

  • Rapidly growing applications in voice and image recognition present numerous low-level granular predictors, such as:

    • Pixel values in images

    • Wave amplitudes in audio

  • Deep Learning's Impact:

    • Significant advancements in speech recognition, computer vision, and natural language processing.

Deep Learning for Image Processing (Unsupervised Learning)

  • Context:

    • In image recognition, pixel values serve as predictors, often exceeding 100,000.

    • The critical ability of deep learning models is to learn features without supervision.

  • Example:

    • Separate pixels in an image (e.g., a football field) into distinct areas, such as "green field" versus "yard markers," without prior knowledge of these concepts.

    • This leading to the emergence of boundaries and edges.

    • The learning process transitions from identifying local, simple features to encompassing global, complex features.

Example of Feature Detection

  • Task: Instructing a machine to find an eye in an image.

  • Features of Interest:

    • A small solid circle representing the pupil.

    • An iris surrounding the pupil.

    • A surrounding white area.

Simplified Image Representation

  • Assume an image comprised of 14x7 pixels.

  • Each pixel value is a color code ranging from 0 to 255.

  • Example of a representative 14x7 matrix (values are arbitrary):

    • 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0

    • 0, 1, …, 1, 0, 0

    • 1, 0, …, 5, 5, 5, …, 0, 0, 1, 0

    • etc.

Convolutional Neural Networks (CNN)

  • Definition:

    • A popular deep learning model implemented for image recognition.

  • Functionality:

    • Aggregate predictors (pixels) instead of assigning individual weights to each one; apply a convoluting operation to grouped pixels.

    • A common aggregation method involves a 3x3 pixel area, e.g., around a person's chin.

What is Convolution?

  • Mathematical Definition:

    • A convolution signifies a mathematical operation on two functions (e.g., f and g) resulting in a third function that illustrates how one function's shape is modified by another.

    • Convolution describes the interaction and overlap of two functions as one slides across the other, serving the purpose of extracting and transforming features/signals/data.

Applying the Convolution

  • Convolution operation process:

    • Multiply the pixel matrix by a filter matrix, then sum the results.

  • Example Calculation:

    • For a filter that identifies central vertical lines:

    • Calculation: 025 + 1200 + 025 + 025 + 1225 + 025 + 025 + 1225 + 0*25 = 650

    • Result: This sum is relatively high compared to other arrangements since pixel values are elevated in the central column.

Continuing the Convolution Process

  • As the filter matrix shifts across the image:

    • It records its results, resulting in a smaller matrix indicating the presence/absence of vertical lines.

  • Other filters can identify horizontal lines, curves, and borders, representing hyper-local features.

    • Further convolutions on these local features yield a multi-dimensional matrix, or tensor, of higher-level features.

Filtering and Pooling Example

  • Example: Detecting edges of an image

  • Sobel filters are applied to filter the image.

Convolutions Produce Feature Reduction

  • In supervised learning, successful convolutions and features are preserved for tagging images.

  • Feature learning results in fewer, simpler features than the original set of pixel values.

Unsupervised Learning: Autoencoding

  • Deep learning networks can discover high-level features without labelling guidance.

  • Structure:

    • The network has a mechanism to generate an image from high-level features at a bottleneck.

    • The generated image is evaluated against the original, prompting adjustments similar to backpropagation if mismatched.

    • The network fosters the architecture yielding the best matches.

Simple Autoencoder

  • Predicts its input.

  • Structure:

    • Includes one hidden layer (simple) or multiple hidden layers (deep autoencoder).

Representation Comparison

  • Comparison of non-linear autoencoder and PCA in a 2D space showcasing learned data groupings.

  • Source: Hinton and Salakhutdinov (2006).

Recurrent Neural Networks (RNN)

  • Characteristics:

    • Networks possess cycle-forming connections (feedback).

    • Each hidden unit connects to itself and others, providing an internal state ideal for processing sequence data (e.g., handwriting recognition, speech, translation).

    • RNNs can be conceptually unwrapped over time for computational comprehension.

    • Each step utilizes the same weights and biases over time links to units.

Memory Loop in RNNs

  • RNNs include a memory loop allowing them to recall past information.

  • Applications: Suitable for time series analysis, sequential data like speech, music, and text.

Data Structure in RNNs

  • Original time series replicated into overlapping sub-series, with each labeled for one-step-ahead forecasting.

  • Predictors formatted as follows:

    • Series: y1, y2, …, y_w

    • Prediction: y_{w+1}

    • Further structure: y2, y3, …, y_{w+1}

  • Continuing the pattern through y{t-w}, …, y{t-1} forecasts y_t.

The Issue of Vanishing Gradients in RNNs

  • RNNs face challenges with gradient calculation, where the gradient for parameters at layer L decomposes into matrix multiplication forms.

  • Due to repetitions of the same matrix across time steps, gradients can vanish to zero or explode to infinity, similar to how magnitudes behave when raised to a power (approaching zero or growing indefinitely).

Long Short Term Memory (LSTM)

  • LSTMs address the short-term memory issues faced by RNNs concerning vanishing gradients.

  • They utilize a gate operator (forget gate) enabling information retention, whereby the network adjusts the retention period to optimize prediction performance.

LSTM Architecture and Functions

  • Designed specifically to resolve the vanishing gradient issue.

  • Structure:

    • Memory cells, input gates, output gates, and forget gates are incorporated.

  • Memory cells uniquely maintain information for extended durations.

  • Each cell possesses input and output gates controlled by learnable weights based on present observations and the prior hidden states.

  • Enhances backpropagation by allowing error terms to be stored and propagated without degradation.

LSTM Architecture

  • Source: Arpit Rathore, MDTI Project Report, University of Ottawa, 2021.

Appendix: Deep Learning Software

  • The appendix provides additional software information supporting deep learning, not part of course materials but for reference.

Deep Learning Software - Theano

  • A Python library focused on deep learning research (Bergstra et al., 2010; Theano Development Team, 2016).

  • A versatile tool for mathematical programming that extends NumPy with symbolic differentiation and GPU support.

  • Features a high-level language for deep learning model expressions and a compiler optimized for performance leveraging GPU capabilities.

  • Supports execution on multiple GPUs.

Theano Features

  • Enables declaring symbolic variables for inputs/targets, numerical values provided at runtime.

  • Shared variables like weights are tied to NumPy array-stored values.

  • Generates symbolic graphs defining mathematical operations comprising variable, constant, apply, and operation nodes.

  • Constant nodes aid optimization by remaining unchanged during computation.

Deep Learning Software - Tensor Flow

  • A C++ and Python library for numerical computations typically associated with deep learning (Abadi et al., 2016).

  • Heavily inspired by Theano, utilizing dataflow graphs to represent multidimensional array communication (called