Skip to main content

Understanding Generative Adversarial Networks - Part II

In "Understanding Generative Adversarial Networks - Part I" you gained a conceptual understanding of how GAN works. In this post let us get a mathematical understanding of GANs.

The loss functions can be designed most easily using the idea of zero-sum games. 
The sum of the costs of all players is 0.
   
   
This is the Minimax algorithm for GANs

Let’s break it down.

Some terminology:
V(D, G) : The value function for a minimax game
E(X) : Expectation of a random variable X, also equal to its average value
D(x) : The discriminator output for an input x from real data, represents probability
G(z): The generator's output when its given z from the noise distribution
D(G(z)): Combining the above, this represents the output of the discriminator when 
given a generated image G(z) as input

Now, as explained above, the discriminator is the maximizer and hence it tries to 
maximize V(D, G). The discriminator wants to correctly label an image from the input 
data as real.

Thus, it tries to maximize D(x). At the same time, a generated image (created by the 
generator), must have a very low chance of coming from the input data -- it should be 
fake. Thus, D(G(z)) should be small, or 1 - D(G(z))should be large. And as log is an 
increasing function ( it increases with increasing x), one can easily see how V(D, G) 
is getting maximized here.

The converse is true for the generator. It wants to increase the chance of the 
discriminator incorrectly classifying a generated image as real. Thus, D(G(z)) should be 
large. As this term increases, log(1 - D(G(z))decreases. Thus, V(D, G) decreases here.

Now, as we have understood the intuition behind the minimax algorithm for 
adversarial networks, let’s discuss the gradients.



As explained above, the discriminator has to maximize the minimax value function 
V(D, G). Thus, it must undergo what is called gradient ascent (yeah.. not descent). 
It’s weights must be updated with the above gradient.

Coming to the generator, it must undergo gradient descent with respect to the this:


Now comes the actual implementation:
A for loop for the number of iterations we want to perform encompasses the entire code,
as expected. Next, another for loop is run over the discriminator training part for k 
iterations. This means that for every k iterations over the discriminator, the generator’s 
weights and biases are updated only once.

                    image source

The reason for this is to avoid something called the Helvetica Scenario. Let’s go back 
to the forger-officer analogy. Suppose that particular officer is colour blind. Now, if the 
forger makes fake money which is identical to real money except that it has a slightly 
different, but noticeable, colour difference, the officer will treat the forged money as 
authentic money. As the officer did not give any feedback on how to improve, the forger 
has no reason to improve his or her technique. After that, all generated currency will 
fool that particular officer, but it won’t actually be what we hoped for -- 
indistinguishable from real currency.

This is the gist of what the Helvetica Scenario means. The generator unintentionally 
finds a small weakness in the discriminator and exploits it, succeeding in the immediate 
goal, but failing in the long term.

Hence, it is more important to train the discriminator first. Once the discriminator is 
reasonably confident, it can give very valuable feedback to the generator, which in turn 
helps achieve our end goal, which is to generate a life-like image.

Coming back to the algorithm, in each of those k iterations, the discriminator ‘s 
parameters are updated.

Then, the generator is trained for one iteration and this process continues till 
convergence. The value of k can vary a lot, the minimum is , of course, 1.





By 

Aniruddha Karajgi,
Research Intern,
Cere Labs Pvt. Lt.

Comments

Popular posts from this blog

How is AI Saving the Future

Meanwhile the talk of AI being the number one risk of human extinction is going on, there are lot many ways it is helping humanity. Recent developments in Machine Learning are helping scientists to solve difficult problems ranging from climate change to finding the cure for cancer. It will be a daunting task for humans to understand enormous amount of data that is generated all over the world. Machine Learning is helping scientists to use algorithms that learn from data and find patterns. Below is a list of few of the problems AI is working on to help find solutions which otherwise would not have been possible: Cancer Diagnostics : Recently, scientists at University of California (UCLA) applied Deep Learning to extract features for achieving high accuracy in label-free cell classification. This technique will help in faster cancer diagnostics, and thus will save a lot of lives. Low Cost Renewable Energy : Artificial-intelligence is helping wind power forecasts of u...

In the World of Document Similarity

How does a human infer whether two documents are similar? This question has dazzled cognitive scientists, and is one area under which a lot of research is taking place. As of  now there is no product that is able to match or surpass human capability in finding the similarity in documents. But things are improving in this domain, and companies such as IBM and Microsoft are investing a lot in this area. We at Cere Labs, an Artificial Intelligence startup based in Mumbai, also are working in this area, and have applied LDA and Word2Vec techniques, both giving us promising results: Latent Dirichlet Allocation (LDA) : LDA is a technique used mainly for topic modeling. You c an leverage on this topic modeling to find the similarity between documents. It is assumed that more the topics two documents overlap, more are the chances that those documents carry semantic similarity. You can study LDA in the following paper: https://www.cs.princeton.edu/~blei/papers/BleiNgJordan20...

Anomaly Detection based on Prediction - A Step Closer to General Artificial Intelligence

Anomaly detection refers to the problem of finding patterns that do not conform to expected behavior [1]. In the last article "Understanding Neocortex to Create Intelligence" , we explored how applications based on the workings of neocortex create intelligence. Pattern recognition along with prediction makes human brains the ultimate intelligent machines. Prediction help humans to detect anomalies in the environment. Before every action is taken, neocortex predicts the outcome. If there is a deviation from the expected outcome, neocortex detects anomalies, and will take necessary steps to handle them. A system which claims to be intelligent, should have anomaly detection in place. Recent findings using research on neocortex have made it possible to create applications that does anomaly detection. Numenta’s NuPIC using Hierarchical Temporal Memory (HTM) framework is able to do inference and prediction, and hence anomaly detection. HTM accurately predicts anomalies in real...