Derivation of ELBO in VAE

I read a blog on how to build VAE using TensorFlow, and I reviewed the fundamental equation behind VAE using David Blei’s material, which is the Evidence Lower BOund (ELBO).

Image for post
Image for post

So VAE finds a lower bound of the log likelihood logp(x) using Jensen’s inequality, which also appears in the derivation of EM algorithm.

Intuitively, the first part of ELBO maximizes the log likelihood, the likelihood tries to make the generated image more correlated to the latent variable, which makes the model more deterministic.

The second part of ELBO minimizes the KL divergence between the posterior and the prior. Since we usually assume the prior is a standard Gaussian distribution (why?), and minimizing the KL will make the posterior more similar to the prior, which means we are trying to make the posterior to be a smooth Gaussian distribution, while at the same time expand evenly through the entire latent space, so it gives the model more randomness.

So it seems the VAE also somehow includes an adversarial training.

Next, I’ll use an example to illustrate this formula.

Image for post
Image for post
Image from here

In our experiments we found that the number of samples L per datapoint can be set to 1 as long as the minibatch size M was large enough, e.g. M = 100.

So for one image, only one sample z is used to estimate the likelihood (which is quite amazing), this means the first part in ELBO is approximated by just ln p(x|z). (There are following works using more accurate estimation of the likelihood)

If we assume p(x|z) is Bernoulli, the parameter is estimated using the reconstructed pixel value

Image for post
Image for post

This means if the real pixel value is 1, then likehood for that pixel is

Image for post
Image for post

Else if the real pixle value is 0, then likelihood for that pixel is

Image for post
Image for post

Combine them together, we have

Image for post
Image for post

which is the cross entropy between real pixel and reconstructed pixel. Then we need to sum up for all pixels and average over all samples in one batch.

Here is a very good code using TensorFlow.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store