chapter six

6 Bayesian Tools for Machine Learning and Data Science

 

Say, it is October 2020. You meet a Statistician on the streets and ask for the probability of Joe Biden winning the US presidency. Depending on whether the Statistician is a frequentist or a Bayesian, you will get a different answer. The frequentist may say that the event (Joe Biden competing in an US presidency) is not a repeated event, hence the said probability cannot be computed. The Bayesian on the other hand, will probably try to answer by modelling the overall uncertainties and the prevailing beliefs in the election system. Overall, machine learning is a lot closer to the Bayesian paradigm of Statistics. In this chapter we will study Bayesian statistics. Starting with intuitive explanations of conditional probabilities and Bayes Theorem, we will study the concept of Entropy (which models uncertainty in a probabilistic system) and cross entropy which is used to model the loss (error) in a classifier that emits probabilities in supervised classifiers.

6.1 Conditional Probability and Bayes Theorem with recap of Joint and Marginal Probability

6.1.1 Joint and Marginal Probability Revisited

6.1.2 Conditional Probability and Bayes Theorem

6.2 Entropy

6.2.1 Entropy of Gaussian

6.2.2 Python PyTorch code to compute Entropy of a Gaussian

6.3 Cross Entropy

6.3.1 Python PyTorch code to compute Cross Entropy

6.4 KL Divergence

6.4.1 KL Divergence between Gaussians

6.4.2 Python PyTorch code to compute KL Divergence

6.5 Conditional Entropy

6.6 Model Parameter Estimation

6.6.1 Likelihood, Evidence, Posterior and Prior Probabilities

6.6.2 The log-likelihood trick

6.6.3 Maximum Likelihood Parameter Estimation (MLE)

6.6.4 Maximum A Posteriori (MAP) Parameter Estimation and Regularization

6.7 Latent Variables and Evidence Maximization

6.8 Maximum Likelihood Parameter Estimation for Gaussian

6.8.1 Python PyTorch code for Maximum Likelihood Estimation and Maximum A Posteriori Estimation

6.9 Gaussian Mixture Models

6.9.1 Probability Density Function (PDF) of the GMM

6.9.2 Latent variable for class selection and physical interpretations of the GMM PDF terms

6.9.3 Classification via GMM

6.9.4 Maximum Likelihood Estimation of GMM parameters (GMM Fit)

6.9.5 Python PyTorch code for GMM Fit

Chapter Summary