5 Probability Distributions for Machine Learning and Data Science
In machine learning, we take large feature vectors as inputs. and ascribe them to one or more of pre-defined classes. Such a machine is called a classifier.
As stated earlier, we can view the feature vectors as points in a high dimensional space. Now suppose, with each point in the input space we associate the probabilities of belonging to each of the possible classes. Then, given any input we will simply pick the class with the highest probability.
In effect, we have a classifier. Thus, we are modelling the probability distributions of the classes over input space.
Such probability distribution modelling classifiers give us more insight into the actual underlying phenomenon compared to classifiers that simply emit the class to which the input belongs. To see this, consider the problem of identifying horses, zebras and apes from photos. Given an input image, the non-probabilistic classifier simply emits a best guess as to whether its the photo of a horse or a zebra or an ape. The probabilistic classifier on the other hand, emits probabilities of the input image being a horse, zebra and ape. From them we might observe that for every horse image, the probability of a zebra is relatively high and vice versa, while for an ape image, the probabilities of both horse and zebra are low. This leads to the insight that horse and zebra looks somewhat similar while ape looks quite different from both of them.