3 Uncertainty sampling

This chapter covers

Understanding the scores of a model prediction
Combining predictions over multiple labels into a single uncertainty score
Combining predictions from multiple models into a single uncertainty score
Calculating uncertainty with different kinds of machine learning algorithms
Deciding how many items to put in front of humans per iteration cycle
Evaluating the success of uncertainty sampling

The most common strategy that people use to make AI smarter is for the machine learning models to tell humans when they are uncertain about a task and then ask the humans for the correct feedback. In general, unlabeled data that confuses an algorithm is most valuable when it is labeled and added to the training data. If the algorithm can already label an item with high confidence, it is probably correct.

This chapter is dedicated to the problem of interpreting when our model is trying to tell us when it is uncertain about its task. But it is not always easy to know when a model is uncertain and how to calculate that uncertainty. Beyond simple binary labeling tasks, the different ways of measuring uncertainty can produce vastly different results. You need to understand and consider all methods for determining uncertainty to select the right one for your data and objectives.

3.1 Interpreting uncertainty in a machine learning model

3.1.1 Why look for uncertainty in your model?

3.1.2 Softmax and probability distributions

3.1.3 Interpreting the success of active learning

3.2 Algorithms for uncertainty sampling

3.2.1 Least confidence sampling

3.2.2 Margin of confidence sampling

3.2.3 Ratio sampling

3.2.4 Entropy (classification entropy)

3.2.5 A deep dive on entropy