L6_Augmentation and Advanced Computer vision

Data Pipelines

In deep learning, data handling is crucial due to memory limitations and the presence of two compute units: CPU and GPU. Efficient data handling is essential for training large models on substantial datasets.

Problem: Data often exceeds memory capacity, hindering direct processing.
Opportunity: Utilize both CPU and GPU efficiently to manage and process data.

Implications:

Data must be processed in batches to fit within memory constraints.
The CPU can prepare the subsequent batch while the GPU processes the current one, maximizing resource utilization.

Sequential Processing

Naive approach to data processing involves opening, reading, and training sequentially for each epoch. This is inefficient as the GPU sits idle while the CPU prepares data.

Prefetching

To optimize, prefetching allows the CPU to prepare the next batch while the GPU is training on the current batch, reducing idle time. This asynchronous data loading significantly speeds up training.

Prefetching + Interleaving File Reads

Further optimization involves interleaving file reads, enabling parallel data access from multiple files. This reduces I/O bottlenecks and improves data loading speeds.

Prefetching + Interleaving + Parallel Preprocessing

Achieve maximum efficiency by combining prefetching, interleaving, and parallel preprocessing. This ensures that data is readily available for the GPU, eliminating wait times.

TensorFlow Dataset

TensorFlow provides the tf.data.Dataset API to streamline data handling. It offers a flexible and efficient way to construct data pipelines for deep learning models.

Effort is primarily in getting data into a Dataset. The tf.data.Dataset API handles complexities like parallelization and prefetching.
Keras offers convenience functions for standard data types, simplifying the process of creating datasets from common data formats.

Creating a Dataset

Example of creating a Dataset from a list:

dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])

Applying Transformations

Use the .map() method to apply transformations. This allows you to apply custom functions to each element of the dataset.

def square(x):
    return x * x

new_dataset = dataset.map(square)

Important

Dataset methods return a new Dataset rather than modifying in place. This ensures that the original dataset remains unchanged.

Parallel Processing Chain

Set up a parallel processing chain to optimize data processing:

shuffled_ds = dataset.shuffle()  # optional
preprocessed_ds = shuffled_ds.map(preprocessing_fn, num_parallel_calls=10)
batched_ds = preprocessed_ds.batch(batch_size)
prefeched_ds = batched_ds.prefetch(buffer_size)

Or as a one-liner:

ds = dataset.shuffle().map().batch().prefetch()

tf.data.AUTOTUNE can be used for buffer sizes and parallel calls. This allows TensorFlow to dynamically adjust these parameters for optimal performance.

Getting Data into a Dataset

Keras provides utilities for common data types:

Images: keras.utils.image_dataset_from_directory
Time series: keras.utils.timeseries_dataset_from_array
Text: keras.utils.text_dataset_from_directory
Audio: keras.utils.audio_dataset_from_directory
CSV files: Use TensorFlow tutorials for parsing and loading CSV files into a dataset.
Something else: Custom code may be required for more complex or specialized data formats.

Training a Model

Datasets are used with .fit() to train models. This simplifies the training process by providing a standardized way to feed data to the model.

model.fit(
    train_dataset,
    validation_data=val_dataset,
)

Improving Generalization: Augmentation

Problem: Limited training data may not cover all realistic examples, leading to overfitting.
Mitigation: Augment data with artificially modified duplicates to increase the diversity of the training set.

Example: A rotated cat is still a cat. Augmenting images with rotations, flips, and zooms can improve model robustness.

Augmentation

Benefits exceed the effort. Data augmentation is a simple yet effective technique to improve model performance.
Recommended for computer vision. Augmentation helps models generalize better to unseen data.
keras.layers.RandAugment can be used for comprehensive augmentation. This layer applies a random combination of augmentations to each image.

Advanced Network Configurations

Going beyond the Sequential model to create more complex and flexible network architectures.

Keras Functional API

Alternative way of defining a network, offering more flexibility. The Functional API allows you to create complex architectures with multiple inputs and outputs.

Example:

Sequential Model:

model = keras.Sequential([
    layers.Input(shape=input_shape),
    layers.Conv2D(64, 3, activation="relu"),
    layers.Conv2D(64, 3, activation="relu"),
    layers.MaxPooling2D(pool_size=(2, 2)),
    layers.Conv2D(128, 3, activation="relu"),
    layers.Conv2D(128, 3, activation="relu"),
    layers.Flatten(),
    layers.Dropout(0.5),
    layers.Dense(num_classes, activation="softmax"),
])

Functional API:

inputs = layers.Input(shape=input_shape)
x = layers.Conv2D(64, 3, activation="relu")(inputs)
x = layers.Conv2D(64, 3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Conv2D(128, 3, activation="relu")(x)
x = layers.Conv2D(128, 3, activation="relu")(x)
x = layers.Flatten()(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(num_classes, activation="softmax")(x)
model = keras.Model(inputs=inputs, outputs=outputs)

Bird Classifier

Example of combining different types of data (image and numerical). This showcases the flexibility of the Functional API in handling diverse data inputs.

Input: Image of bird and size of bird. Combining image data with numerical features can improve classification accuracy.
Combine convolutional layers for image processing with dense layers for numerical data. This allows the model to learn from both image features and numerical characteristics.

Non-sequential networks offer:

Multiple inputs, allowing the model to ingest various types of data.
Multiple outputs, enabling the model to perform multiple tasks simultaneously.
Arbitrary layer connections, providing flexibility in designing complex architectures.
Networks inside networks, allowing for hierarchical feature extraction.
Loops, enabling the creation of recurrent neural networks.

Noteworthy Architectures

Inception Networks

Features parallel convolutional layers with different kernel sizes. This allows the network to capture features at different scales.

Output is concatenated, combining features extracted by different kernel sizes.
Example: GoogLeNet (2014), a pioneering architecture in the Inception family.

Residual Networks

Features skip connections, where data is passed around layers and added back in. This helps to mitigate the vanishing gradient problem.

Avoids vanishing gradients or dead layers. Skip connections allow gradients to flow more easily through the network.

Densely Connected Convolutional Networks (DenseNet)

Adds skip connections almost everywhere, connecting each layer to every other layer in the network. This promotes feature reuse and improves information flow.

Xception Networks

Relies on depthwise separable convolution layers (keras.layers.SeparableConv2D). This reduces the number of parameters and computational complexity.

Can improve performance, especially when dealing with large images.

Keras Applications

Popular computer vision architectures are available as pre-trained models in keras.applications. These pre-trained models can be used for transfer learning and feature extraction.

Excellent starting points for feature extraction, fine-tuning, and transfer learning. Leveraging pre-trained models can save significant training time and resources.
Require specific preprocessing (e.g., keras.applications.xception.preprocess_input()). Each pre-trained model has its own specific input requirements.

Other Computer Vision Tasks

Convolutional nets are used for more than just classification. They are versatile tools for various computer vision tasks.

Other Computer Vision Tasks

Convolutional nets are used for more than just classification. They are versatile tools for various computer vision tasks.

Segmentation

Classify each pixel, assigning a class label to every pixel in the image. This allows for detailed image understanding.

Output must have the same dimensions as the input. The output is a pixel-wise classification map.
Uses an encoder-decoder structure (e.g., U-Net (2015)). The encoder reduces the spatial dimensions, while the decoder reconstructs the segmentation map.

Object Detection

Detecting multiple objects and indicating their location with bounding boxes

Oriented Bounding Boxes

Object detection with bounding boxes that allows a rotation angle for more precise localization of objects such as text in images.

Semantic Segmentation

Like segmentation, it classifies each pixel but it doesn't distinguish different instances of the same object type.

Instance Segmanetion

Like semantic segmentation, but it distinguishes different instances of the same object type.

Pose Estimation

Identifies the pose or orientation of an object, often used to track human movement or analyze object positioning.

Segmentation

Classify each pixel, assigning a class label to every pixel in the image. This allows for detailed image understanding.

Output must have the same dimensions as the input. The output is a pixel-wise classification map.
Uses an encoder-decoder structure (e.g., U-Net (2015)). The encoder reduces the spatial dimensions, while the decoder reconstructs the segmentation map.