GAN Loss Functions (All in One)

min G max D - in a nutshell

Interpretation: D(·) outputs a probability (between 0 and 1) of being real.

1. Original GAN (Minimax)

$$ \min_G \max_D V(D, G) = \mathbb{E}_{x \sim p_{\text{data}}}[\log D(x)] + \mathbb{E}_{z \sim p_z}[\log(1 - D(G(z)))] $$

Step 1: Update Discriminator

$$ \max_D \mathbb{E}_{x}[\log D(x)] + \mathbb{E}_{z}[\log(1 - D(G(z)))] $$

$$ \mathcal{L}_D = - \mathbb{E}_{x}[\log D(x)] - \mathbb{E}_{z}[\log(1 - D(G(z)))] $$

Step 2: Update Generator

(a) Minimax:

$$ \mathcal{L}_G = \mathbb{E}_{z}[\log(1 - D(G(z)))] $$

(b) Non-saturating:

$$ \mathcal{L}_G = - \mathbb{E}_{z}[\log D(G(z))] $$


PyTorch Example


criterion = nn.BCELoss()

for epoch in range(num_epochs):
    for batch_idx, (real, _) in enumerate(loader):
        real = real.view(-1, 784).to(device)
        batch_size = real.shape[0]
        
        noise = torch.randn(batch_size, z_dim).to(device)
        fake = gen(noise)

        disc_real = disc(real).view(-1)
        lossD_real = criterion(disc_real, torch.ones_like(disc_real))

        disc_fake = disc(fake).view(-1)
        lossD_fake = criterion(disc_fake, torch.zeros_like(disc_fake))

        lossD = (lossD_real + lossD_fake) / 2

        disc.zero_grad()
        lossD.backward(retain_graph=True)
        opt_disc.step()

        output = disc(fake).view(-1)
        lossG = criterion(output, torch.ones_like(output))

        gen.zero_grad()
        lossG.backward()
        opt_gen.step()

5. Least Squares GAN (LSGAN)

$$ \mathcal{L}_D = \frac{1}{2} \mathbb{E}[(D(x) - 1)^2] + \frac{1}{2} \mathbb{E}[(D(G(z)))^2] $$

$$ \mathcal{L}_G = \frac{1}{2} \mathbb{E}[(D(G(z)) - 1)^2] $$


6. Wasserstein GAN (WGAN)

$$ \mathcal{L}_D = - \mathbb{E}[D(x)] + \mathbb{E}[D(G(z))] $$

$$ \mathcal{L}_G = - \mathbb{E}[D(G(z))] $$


7. WGAN-GP

$$ \mathcal{L}_D = - \mathbb{E}[D(x)] + \mathbb{E}[D(G(z))] + \lambda \mathbb{E}_{\hat{x}} [(||\nabla_{\hat{x}} D(\hat{x})||_2 - 1)^2] $$

$$ \mathcal{L}_G = - \mathbb{E}[D(G(z))] $$


Pix2Pix GAN

Conditional GAN: image → image

Loss

$$ \mathcal{L}_{GAN}(G, D) = \mathbb{E}_{x,y}[\log D(x,y)] + \mathbb{E}_{x}[\log(1 - D(x, G(x)))] $$

$$ \mathcal{L}_{L1} = \mathbb{E}_{x,y}[|y - G(x)|] $$

$$ G^* = \arg \min_G \max_D \mathcal{L}_{GAN} + \lambda \mathcal{L}_{L1} $$

Summary

Intuition: Generate a realistic image that matches the input.

Variational Autoencoder (VAE)

Gaussian Mixture Model (GMM)

In Gaussian Mixture Models, we assume that the data is generated from a mixture of several Gaussian distributions. \enter Each Gaussian component is associated with a latent variable z that indicates which component generated the data point x. For example: MNIST digits can be modeled as a mixture of 10 Gaussians, where each Gaussian corresponds to a digit class (0-9).

$$ z \to x $$

$$ p(x) = \sum_z p(x,z) = \sum_z p(z) p(x|z) = \sum_{k=1}^{K} p(z = k) \mathcal{N}(x; \mu_k, \Sigma_k) $$

Mean and std of each Gaussian component are learned during training using the Expectation-Maximization (EM) algorithm. In case of VAE, we assume infinite Gaussian components, which leads to a continuous latent space.

$$ Z \sim \mathcal{N}(0, I) $$

$$ p(x|z) = \mathcal{N}(x; \mu(z), \sigma(z)) $$

where μ and σ are outputs of the decoder network and are learned during neural network training.

GMM vs VAE: Understanding the Analogy

In Gaussian Mixture Models (GMM), we assume that data is generated from a finite number of Gaussian components. Each component is selected by a discrete latent variable z.

In contrast, Variational Autoencoders (VAE) use a continuous latent space, where:

GMM Formulation

p(x) = Σk=1K p(z = k) N(x; μk, Σk)

VAE Formulation

p(x) = ∫ p(z) p(x | z) dz

Key Intuition

Geometric View

Important Note

VAE is not literally a GMM with infinite components, but it can be interpreted as a continuous mixture model where integration replaces summation.