Generative Adversarial Networks (GANs)

Generative Adversarial Networks (GANs)

Module covering image generation, style transfer, and recent advancements in GANs.

Types of GAN

GANs consist of a generator and a discriminator competing against each other.

  • Generator: Creates fake data.

  • Discriminator: Tries to distinguish between real and fake data.

GAN Types for Computer Vision

  • Vanilla GANs

  • Cycle-GAN algorithm

  • Pix2pix GAN

  • Style GAN

  • Deep Convolutional GAN (DCGAN)

  • Conditional GAN (CGAN)

  • Pixel Recurrent Neural Network (PixelRNN)

  • DiscoGAN

  • Super Resolution GAN (SRGAN)

  • InfoGAN

  • StackGAN

Mathematical Formula for GANs

The mathematical representation of GANs involves a value function V(D, G), where D is the discriminator and G is the generator. The formula is:

V(D, G) = E{x \sim P{data}(x)}[log D(x)] + E_{z \sim p(z)}[log(1 - D(G(z)))]

  • G: Generator

  • D: Discriminator

  • P_{data}(x): Distribution of real data

  • p(z): Distribution of generator

  • x: Sample from P_{data}(x)

  • z: Sample from p(z)

  • D(x): Discriminator network

  • G(z): Generator network

GAN Objective

The GAN aims to solve a minimax problem:

\minG \maxD V(D, G) = E{x \sim P{data}(x)}[\log D(x)] + E_{z \sim p(z)}[\log(1 - D(G(z)))]

where:

  • The discriminator tries to maximize the probability of correctly classifying real and fake data.

  • The generator tries to minimize the probability of the discriminator correctly classifying generated data as fake.

The discriminator aims to identify real samples:

  • Maximize log D(x) (discriminator output on real data).

The generator aims to fool the discriminator:

  • Minimize log(1 - D(G(z))) (discriminator output on generated data).

GAN Training Process

  1. Discriminator's Perspective:

    • Discriminator tries to identify real samples from the real dataset and generated samples from the generator.

    • Loss function is dominant, which can make the generator's learning difficult.

  2. Generator's Perspective:

    • Generator tries to produce samples that the discriminator classifies as real.

  3. Equilibrium:

    • The generator fools the discriminator.

Implementation Details

  1. Components:

    • Generator (G)

    • Discriminator (D)

  2. Loss Function:

    • Binary cross-entropy

  3. Optimization:

    • Adam optimizer

  4. Training Loop:

    • Epoch-wise training

    • Save the model periodically

  5. Steps:

    • Sample real data and noise.

    • Generate samples using the generator.

    • Calculate discriminator loss for real and fake samples: D{loss} = D{lossReal} + D_{lossFake}

    • Compute gradient and update discriminator weights (backpropagation).

    • Calculate generator loss.

    • Compute gradient and update generator weights (backpropagation).

Types of GANs

Vanilla GANs
  • Simplest form of GANs, using multi-layer neural networks for both generator and discriminator.

  • Used for text generation and data augmentation.

  • Advantages: Simple to implement, performs well in a wide range of applications.

  • Disadvantages: Unstable and can easily fail to converge.

Deep Convolutional GANs (DCGAN)
  • Uses deep convolutional neural networks.

Conditional GANs (CGAN)
  • GANs conditioned on extra information like labels.

  • Example: Generating only Mercedes cars from a dataset of various cars by specifying "Mercedes" as a condition.

  • Advantages: Generator can be tuned to generate samples more similar to training data.

  • Extra information can improve the quality of generated samples.

Super Resolution GANs (SRGAN)
  • Used to create high-resolution images from low-resolution images.

Pixel Recurrent Neural Network (PixelRNN)
  • Generative neural networks that sequentially predict the pixels in an image along the two spatial dimensions.

  • Model:

    • Models the discrete probability of the raw pixel values and encode the complete set of dependencies in the image.

  • Variants:

    • Row LSTM and the Diagonal BiLSTM, that scale more easily to larger datasets.

  • Pixel values:

    • Pixel values are treated as discrete random variables by using a softmax layer in the conditional distributions.

  • Masked convolutions:

    • Masked convolutions are employed to allow PixelRNNs to model full dependencies between the color channels.

  • Process:

    • Predicts the conditional distribution over the possible pixel values given the scanned context.

    • p(x, x1,…,x{i-1}) is a probability of i^{th} pixel given all the previously generated pixels ranging from 1 to x-1.

    • p(x{i,R}|X{i,G}|X<i, x{i,R})P(x{i,B}|X<i, X{i, R}, X{i,G})

DiscoGAN
  • A variant of the GAN algorithm designed to produce more realistic images.

  • Consists of a generator and a discriminator trained simultaneously using reinforcement learning.

Super-Resolution GAN (SRGAN)
  • Creates high-resolution images from low-resolution images.

  • Uses a generator network and a discriminator network competing against each other.

  • Advantage: Produces high-resolution images without additional data.

InfoGAN
  • Uses a more sophisticated loss function that encourages the generator to produce images that are not only realistic but also informative.

StackGAN
  • Uses multiple generators and discriminators stacked in a series.

Industrial Applications of GANs

  • Healthcare:

    • Medical image synthesis for enhanced diagnosis and treatment planning.

    • Drug discovery and development through generative models.

    • Generating synthetic patient data for privacy-preserving research.

    • Augmenting medical education with interactive virtual simulations.

  • Gaming:

    • Procedural generation of game environments and levels.

    • AI-generated music and sound effects.

    • AI-generated non-player characters (NPCs) with realistic behaviors.

    • AI-assisted game design and development tools.

  • Fashion:

    • Virtual fashion try-on experiences.

    • AI-generated designs for trend forecasting.

    • Automated fashion styling recommendations.

    • Customized pattern generation for unique garment production.

  • Marketing:

    • Personalized content creation for targeted marketing campaigns.

    • AI-powered recommendation systems.

    • Automated ad generation.

    • Chatbot and virtual assistant integration.

  • Manufacturing:

    • Product design: AI-generated prototypes and designs.

    • Automated inspection: AI-powered visual inspection for defects.

    • Predictive maintenance: Identifying equipment failures before they occur.

    • Process optimization: AI-based optimization of manufacturing processes.

Generative Model

  • Generative network transforms a simple random variable into a more complex one.

    • Input random variable: z \sim P_{prior}(z)

    • Generator function: G(z; \theta_g)

    • Output random variable: x \sim Pg(x; \thetag)

GAN Architecture

  • Latent random variable z \sim P_{prior}(z)

  • Real world images x \sim P_{data}(x)

  • Generator: G(z; \theta_g)

  • Discriminator: D(x; \theta_d)

Terminology

  • P_{data}(x): Data distribution

  • P_g(x): Generated distribution

  • P_{prior}(z): Noise distribution

  • D(x; \theta): Discriminator function with parameters \theta_d

  • G(x; \theta): Generator function with parameters \theta_g

Discriminator Learning

  • Predict "1" for true images and "0" for fake images.

  • Loss function: V' = \frac{1}{m} \sum [\log D(x^{(i)}) + \log(1 - D(G(z^{(i)})))]

  • Gradient ascent: \thetad \leftarrow \thetad + \eta \nabla V'(\theta_d)

Generator Learning

  • Generator needs to fool the discriminator.

  • Discriminator should output "1" for fake images.

  • Loss function: V' = \frac{1}{m} \sum \log(1 - D(G(z^{(i)})))

  • Gradient descent: \thetag \leftarrow \thetag - \eta \nabla V'(\theta_g)

Learning Algorithm

  • Minibatch stochastic gradient descent training.

  • Update discriminator k steps, then update generator 1 step.

GAN Training Steps

  1. Discriminator Training:

    • Fix the generator.

    • Update discriminator parameters using backpropagation.

  2. Generator Training:

    • Fix the discriminator.

    • Update generator parameters using backpropagation to classify generated images as "real".

GAN Takeaways

  • Generator models try to generate data from a given probability distribution.

  • Generator tries to model the input data probability distribution: Pg(x) = P{data}(x)

  • GAN uses adversarial method to train the Generator.

Advancements in GANs and Style Transfer

  • Gated-GAN:

    • Controls information flow between generator and discriminator, generating images with multiple styles.

  • CycleGANs:

    • Unsupervised image-to-image translation without paired training data.

  • HST-GAN:

    • Historical style transfer GAN for generating historical text images.

  • Few-shot learning:

    • Generating high-quality images with few training examples.

  • Improved GAN Technique for Style Transfer:

    • Excellent at preserving style and subject matter when mapping the artistic style of one picture onto the subject of another.

Deepfakes (Generative adversarial network)

  • GANs impact industries dealing with data and images.

  • Deepfakes: Realistic fake photos or face replacements using deep learning.

  • Issue: Potential to spread misinformation.

  • Mitigation: Adopt ethical guidelines, focus on specific case uses, protect human rights.

Ethics of Deepfake Technology

  • Address potential misuse, such as celebrity deep fakes.

  • Develop solutions to detect deepfakes.

  • Consider regulations to tag deepfake content.