0% found this document useful (0 votes)
77 views2 pages

Understanding Wasserstein GANs

Wasserstein Generative Adversarial Networks (WGANs) are a variation of GANs developed in 2017 that utilize a modified loss function based on Wasserstein distance to address issues like mode collapse. The architecture employs deep neural networks for both the generator and discriminator, allowing for continuous and differentiable optimization, which enhances stability and flexibility in training. WGANs provide significant advantages over traditional GANs, including improved performance in gradient descent and reduced risk of model collapse.

Uploaded by

ambrose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views2 pages

Understanding Wasserstein GANs

Wasserstein Generative Adversarial Networks (WGANs) are a variation of GANs developed in 2017 that utilize a modified loss function based on Wasserstein distance to address issues like mode collapse. The architecture employs deep neural networks for both the generator and discriminator, allowing for continuous and differentiable optimization, which enhances stability and flexibility in training. WGANs provide significant advantages over traditional GANs, including improved performance in gradient descent and reduced risk of model collapse.

Uploaded by

ambrose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Wasserstein Generative Adversarial Networks (WGANs)

Last Updated : 15 Apr, 2025

Wasserstein Generative Adversarial Network (WGANs) is a variation of Deep Learning GAN with little
modification in the algorithm. Generative Adversarial Network (GAN) is a method for constructing an
efficient generative model. Martin Arjovsky, Soumith Chintala, and Léon Bottou developed this network in
2017. This is used widely to produce real images.
Wasserstein Generative Adversarial Network

WGAN's architecture uses deep neural networks for both generator and discriminator. The key difference
between GANs and WGANs is the loss function and the gradient penalty. WGANs were introduced as the
solution to mode collapse issues. The network uses the Wasserstein distance, which provides a
meaningful and smoother measure of distance between distributions.

WGAN architecture
WGANs use the Wasserstein distance, which provides a more meaningful and smoother measure of
distance between distributions.

W (Pr , Pg ) = inf γϵ ∏(Pr ,Pg ) E(x,y)∼γ) [∣∣x − y∣∣]


​ ​

​ ​
​ ​

γ denotes the mass transported from x to y in order to transform the distribution Pr to Pg.
denotes the set of all joint distributions γ(x, y) whose marginals are respectively Pr and Pg.

The benefit of having Wasserstein Distance instead of Jensen-Shannon (JS) or Kullback-Leibler


divergence is as follows:

W (Pr, Pg) is continuous.


W (Pr, Pg) is differential everywhere.
Whereas Jensen-Shannon divergence and other divergence or variance are not continuous, but rather
discrete.
Hence, we can perform gradient descent and we can minimize the cost function.

Wasserstein GAN Algorithm


The algorithm is stated as follows:
The function f solves the maximization problem given by the Kantorovich-Rubinstein duality. To
approximate it, a neural network is trained parametrized with weights w lying in a compact space W
and then backprop as a typical GAN.
To have parameters w lie in a compact space, we clamp the weights to a fixed box. Weight clipping is
although terrible, yields good results when experimenting. It is simpler and hence implemented. EM
distance is continuous and differentiable allows to train the critic till optimality.
The JS gradient is stuck at local minima but the constrain of weight limits allows the possible growth of
the function to be linear in most parts and get optimal critic.
Since the optimal generator for a fixed discriminator is a sum of deltas on the places the discriminator
assigns the greatest values to, we train the critic until optimality prevents modes from collapsing.
It is obvious that the loss function at this stage is an estimate of the EM distance, as the critic f in the
for loop lines indicates, prior to each generator update. Thus, it makes it possible for GAN literature to
correlate based on the generated samples' visual quality.
This makes it very convenient to identify failure modes and learn which models perform better than
others without having to look at the generated samples.

Benefits of WGAN algorithm over GAN

WGAN is more stable due to the Wasserstein Distance which is continuous and differentiable
everywhere allowing to perform gradient descent.
It allows to train the critic till optimality.
There is still no evidence of model collapse.
Not struck in local minima in gradient descent.
WGANs provide more flexibility in the choice of network architectures. The weight clipping, generators
architectures can be changed according to choose.

You might also like