part two

Part 2. Maximum likelihood approaches for probabilistic DL models

P art 2 of this book focuses on using neural networks (NNs) as probabilistic models. You might remember from chapter 1 that there is a primary difference between a non-probabilistic and probabilistic model. A non-probabilistic model outputs only one best guess for the outcome, whereas a probabilistic model predicts a whole probability distribution over all possible outcomes. In the cab driver example (see section 1.1), the predicted outcome distribution for the travel time for a given route was a Gaussian. But until now, you haven’t learned how to set up an NN for a probabilistic model. You learn different methods to do so in this part of the book.

In the case of classification, you already know how to get a probability distribution for the outcome. In the fake banknote example (see section 2.1), you set up an NN that predicted for a given banknote a probability for the class fake and for the class real. In the MNIST classification example (see sections 2.1.3 and 2.2.4), you used different NN architectures to predict for a handwritten digit ten probabilities for ten possible classes. To do so, you defined an NN that had in the last layer as many nodes as there are classes. Further, you used a softmax activation to ensure that the output can be interpreted as a probability: the values that are between zero and one and that add up to one. Thus, classification NNs are probabilistic models by construction.