6 Variational inference: Scaling to large datasets: Approximating Bayesian posteriors with simple distributions
This chapter covers
- Turning Bayesian inference into an optimization problem
- The Kullback–Leibler (KL) divergence
- The reparameterization trick
- The pros and cons of variational inference compared to MCMC
We learned about Markov Chain Monte Carlo (MCMC) as a powerful tool for performing Bayesian inference when a conjugate prior is unavailable. But this power comes at a cost: for complex models or big datasets, MCMC can be prohibitively slow due to the need to thoroughly explore the landscape of the posterior distribution. This computational cost can lead to hours or days of waiting, making MCMC impractical in real-world settings.
Variational inference (VI) offers a different workaround for nonconjugate priors. Instead of drawing samples until we (hopefully) have enough representative information, VI reframes inference (that is, finding the posterior distribution) as an optimization problem: we pick a family of probability distributions that are easy to work with, and find the member of that family that’s closest to the true posterior.