5 Probability Distributions for Machine Learning and Data Science

published book

In machine learning, we take large feature vectors as inputs. and ascribe them to one or more of pre-defined classes. Such a machine is called a classifier.

As stated earlier, we can view the feature vectors as points in a high dimensional space. Now suppose, with each point in the input space we associate the probabilities of belonging to each of the possible classes. Then, given any input we will simply pick the class with the highest probability.

In effect, we have a classifier. Thus, we are modelling the probability distributions of the classes over input space.

Such probability distribution modelling classifiers give us more insight into the actual underlying phenomenon compared to classifiers that simply emit the class to which the input belongs. To see this, consider the problem of identifying horses, zebras and apes from photos. Given an input image, the non-probabilistic classifier simply emits a best guess as to whether its the photo of a horse or a zebra or an ape. The probabilistic classifier on the other hand, emits probabilities of the input image being a horse, zebra and ape. From them we might observe that for every horse image, the probability of a zebra is relatively high and vice versa, while for an ape image, the probabilities of both horse and zebra are low. This leads to the insight that horse and zebra looks somewhat similar while ape looks quite different from both of them.

5.1 Probability - the classical frequentist view

5.1.1 Random Variables

5.1.2 Population Histograms

5.2 Probability Distributions

5.3 Impossible and certain events, Sum of probabilities of exhaustive, mutually exclusive events, Independent events

5.3.1 Probabilities of Impossible and Certain Events

5.3.2 Exhaustive and mutually exclusive events

5.3.3 5.3.3 Independent Events

5.4 Joint Probabilities and their distributions

5.4.1 Marginal Probabilities

5.4.2 Dependent Events and their Joint Probability Distribution

5.5 Geometrical View: Sample point distributions for dependent and independent variables

5.5.1 Python Numpy code to draw random samples from a discrete joint probability distribution

5.6 Continuous Random Variables and Probability Density

5.7 Properties of distributions - Expected Value, Variance and Covariance

5.7.1 Expected Value aka Mean

5.7.2 Variance, Covariance, Standard Deviation

5.8 Sampling from a Distribution

5.9 Some famous probability distributions

5.9.1 Uniform Random Distributions

5.9.2 Gaussian (aka Normal) Distribution

5.9.3 Binomial Distribution

5.9.4 Multinomial Distribution

5.9.5 Bernoulli Distribution

5.9.6 Categorical Distribution and one-hot vectors

5.10 Chapter Summary