L6_Augmentation and Advanced Computer vision
Data Pipelines
In deep learning, data handling is crucial due to memory limitations and the presence of two compute units: CPU and GPU. Efficient data handling is essential for training large models on substantial datasets.
Problem: Data often exceeds memory capacity, hindering direct processing.
Opportunity: Utilize both CPU and GPU efficiently to manage and process data.
Implications:
Data must be processed in batches to fit within memory constraints.
The CPU can prepare the subsequent batch while the GPU processes the current one, maximizing resource utilization.
Sequential Processing
Naive approach to data processing involves opening, reading, and training sequentially for each epoch. This is inefficient as the GPU sits idle while the CPU prepares data.
Prefetching
To optimize, prefetching allows the CPU to prepare the next batch while the GPU is training on the current batch, reducing idle time. This asynchronous data loading significantly speeds up training.
Prefetching + Interleaving File Reads
Further optimization involves interleaving file reads, enabling parallel data access from multiple files. This reduces I/O bottlenecks and improves data loading speeds.
Prefetching + Interleaving + Parallel Preprocessing
Achieve maximum efficiency by combining prefetching, interleaving, and parallel preprocessing. This ensures that data is readily available for the GPU, eliminating wait times.
TensorFlow Dataset
TensorFlow provides the tf.data.Dataset API to streamline data handling. It offers a flexible and efficient way to construct data pipelines for deep learning models.
Effort is primarily in getting data into a Dataset. The
tf.data.DatasetAPI handles complexities like parallelization and prefetching.Keras offers convenience functions for standard data types, simplifying the process of creating datasets from common data formats.
Creating a Dataset
Example of creating a Dataset from a list:
dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])
Applying Transformations
Use the .map() method to apply transformations. This allows you to apply custom functions to each element of the dataset.
def square(x):
return x * x
new_dataset = dataset.map(square)
Important
Dataset methods return a new Dataset rather than modifying in place. This ensures that the original dataset remains unchanged.
Parallel Processing Chain
Set up a parallel processing chain to optimize data processing:
shuffled_ds = dataset.shuffle() # optional
preprocessed_ds = shuffled_ds.map(preprocessing_fn, num_parallel_calls=10)
batched_ds = preprocessed_ds.batch(batch_size)
prefeched_ds = batched_ds.prefetch(buffer_size)
Or as a one-liner:
ds = dataset.shuffle().map().batch().prefetch()
tf.data.AUTOTUNEcan be used for buffer sizes and parallel calls. This allows TensorFlow to dynamically adjust these parameters for optimal performance.
Getting Data into a Dataset
Keras provides utilities for common data types:
Images:
keras.utils.image_dataset_from_directoryTime series:
keras.utils.timeseries_dataset_from_arrayText:
keras.utils.text_dataset_from_directoryAudio:
keras.utils.audio_dataset_from_directoryCSV files: Use TensorFlow tutorials for parsing and loading CSV files into a dataset.
Something else: Custom code may be required for more complex or specialized data formats.
Training a Model
Datasets are used with .fit() to train models. This simplifies the training process by providing a standardized way to feed data to the model.
model.fit(
train_dataset,
validation_data=val_dataset,
)
Improving Generalization: Augmentation
Problem: Limited training data may not cover all realistic examples, leading to overfitting.
Mitigation: Augment data with artificially modified duplicates to increase the diversity of the training set.
Example: A rotated cat is still a cat. Augmenting images with rotations, flips, and zooms can improve model robustness.
Augmentation
Benefits exceed the effort. Data augmentation is a simple yet effective technique to improve model performance.
Recommended for computer vision. Augmentation helps models generalize better to unseen data.
keras.layers.RandAugmentcan be used for comprehensive augmentation. This layer applies a random combination of augmentations to each image.
Advanced Network Configurations
Going beyond the Sequential model to create more complex and flexible network architectures.
Keras Functional API
Alternative way of defining a network, offering more flexibility. The Functional API allows you to create complex architectures with multiple inputs and outputs.
Example:
Sequential Model:
model = keras.Sequential([
layers.Input(shape=input_shape),
layers.Conv2D(64, 3, activation="relu"),
layers.Conv2D(64, 3, activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Conv2D(128, 3, activation="relu"),
layers.Conv2D(128, 3, activation="relu"),
layers.Flatten(),
layers.Dropout(0.5),
layers.Dense(num_classes, activation="softmax"),
])
Functional API:
inputs = layers.Input(shape=input_shape)
x = layers.Conv2D(64, 3, activation="relu")(inputs)
x = layers.Conv2D(64, 3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Conv2D(128, 3, activation="relu")(x)
x = layers.Conv2D(128, 3, activation="relu")(x)
x = layers.Flatten()(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(num_classes, activation="softmax")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
Bird Classifier
Example of combining different types of data (image and numerical). This showcases the flexibility of the Functional API in handling diverse data inputs.
Input: Image of bird and size of bird. Combining image data with numerical features can improve classification accuracy.
Combine convolutional layers for image processing with dense layers for numerical data. This allows the model to learn from both image features and numerical characteristics.
Non-sequential networks offer:
Multiple inputs, allowing the model to ingest various types of data.
Multiple outputs, enabling the model to perform multiple tasks simultaneously.
Arbitrary layer connections, providing flexibility in designing complex architectures.
Networks inside networks, allowing for hierarchical feature extraction.
Loops, enabling the creation of recurrent neural networks.
Noteworthy Architectures
Inception Networks
Features parallel convolutional layers with different kernel sizes. This allows the network to capture features at different scales.
Output is concatenated, combining features extracted by different kernel sizes.
Example: GoogLeNet (2014), a pioneering architecture in the Inception family.
Residual Networks
Features skip connections, where data is passed around layers and added back in. This helps to mitigate the vanishing gradient problem.
Avoids vanishing gradients or dead layers. Skip connections allow gradients to flow more easily through the network.
Densely Connected Convolutional Networks (DenseNet)
Adds skip connections almost everywhere, connecting each layer to every other layer in the network. This promotes feature reuse and improves information flow.
Xception Networks
Relies on depthwise separable convolution layers (keras.layers.SeparableConv2D). This reduces the number of parameters and computational complexity.
Can improve performance, especially when dealing with large images.
Keras Applications
Popular computer vision architectures are available as pre-trained models in keras.applications. These pre-trained models can be used for transfer learning and feature extraction.
Excellent starting points for feature extraction, fine-tuning, and transfer learning. Leveraging pre-trained models can save significant training time and resources.
Require specific preprocessing (e.g.,
keras.applications.xception.preprocess_input()). Each pre-trained model has its own specific input requirements.
Other Computer Vision Tasks
Convolutional nets are used for more than just classification. They are versatile tools for various computer vision tasks.
Other Computer Vision Tasks
Convolutional nets are used for more than just classification. They are versatile tools for various computer vision tasks.
Segmentation
Classify each pixel, assigning a class label to every pixel in the image. This allows for detailed image understanding.
Output must have the same dimensions as the input. The output is a pixel-wise classification map.
Uses an encoder-decoder structure (e.g., U-Net (2015)). The encoder reduces the spatial dimensions, while the decoder reconstructs the segmentation map.
Object Detection
Detecting multiple objects and indicating their location with bounding boxes
Oriented Bounding Boxes
Object detection with bounding boxes that allows a rotation angle for more precise localization of objects such as text in images.
Semantic Segmentation
Like segmentation, it classifies each pixel but it doesn't distinguish different instances of the same object type.
Instance Segmanetion
Like semantic segmentation, but it distinguishes different instances of the same object type.
Pose Estimation
Identifies the pose or orientation of an object, often used to track human movement or analyze object positioning.
Segmentation
Classify each pixel, assigning a class label to every pixel in the image. This allows for detailed image understanding.
Output must have the same dimensions as the input. The output is a pixel-wise classification map.
Uses an encoder-decoder structure (e.g., U-Net (2015)). The encoder reduces the spatial dimensions, while the decoder reconstructs the segmentation map.