Say, it is October 2020. You meet a Statistician on the streets and ask for the probability of Joe Biden winning the US presidency. Depending on whether the Statistician is a frequentist or a Bayesian, you will get a different answer. The frequentist may say that the event (Joe Biden competing in an US presidency) is not a repeated event, hence the said probability cannot be computed. The Bayesian on the other hand, will probably try to answer by modelling the overall uncertainties and the prevailing beliefs in the election system. Overall, machine learning is a lot closer to the Bayesian paradigm of Statistics. In this chapter we will study Bayesian statistics. Starting with intuitive explanations of conditional probabilities and Bayes Theorem, we will study the concept of Entropy (which models uncertainty in a probabilistic system) and cross entropy which is used to model the loss (error) in a classifier that emits probabilities in supervised classifiers.
chapter six