concept probability distribution in category machine learning
appears as: probability distribution, probability distributions,
probability distribution, The probability distribution, probability distributions, probability distribution

This is an excerpt from Manning's book Human-in-the-Loop Machine Learning MEAP V09.
Figure 3.1: showing how softmax creates probability distributions in two types of architectures. In the top example, softmax is the activation function of the output (final) layer, directly outputting a probability distribution. In the bottom example, a linear activation function is used on the output layer, creating model scores (logits) that are then converted into probability distributions via softmax. The bottom architecture is only slightly more complicated, but is it preferred for Active Learning as it is more informative.
![]()
As Figure 3.1 shows, softmax is often used as the activation function on the final layer of the model to produce a probability distribution as the set of scores associated with the predicted labels. Softmax can also be used to create a probability distribution from the outputs of a linear activation function.
def softmax(self, scores, base=math.e): """Returns softmax array for array of scores Converts a set of raw scores from a model (logits) into a probability distribution via softmax. The probability distribution will be a set of real numbers such that each is in the range 0-1.0 and the sum is 1.0. Assumes input is a pytorch tensor: tensor([1.0, 4.0, 2.0, 3.0]) Keyword arguments: prediction -- a pytorch tensor of any positive/negative real numbers. base -- the base for the exponential (default e) """ exps = (base**scores.to(dtype=torch.float)) # exponential for each value in array sum_exps = torch.sum(exps) # sum of all exponentials prob_dist = exps / sum_exps # normalize exponentials return prob_dist