concept baseline in category machine learning

This is an excerpt from Manning's book Human-in-the-Loop Machine Learning MEAP V09.
Random sampling sounds the simplest, but it can actually be the trickiest: what is random if your data is pre-filtered, when your data is changing over time, or if you know for some other reason that a random sample will not be representative of the problem you are addressing? These are addressed in more detail in the following sub-section. Regardless of the strategy, some amount of random data should always be annotated in order to gauge the accuracy of your model and compare your Active Learning strategies to a baseline of randomly selected items.
This entire chapter and the next uses the concept of expected and actual annotation accuracy. For example, if someone simply guessed randomly for each annotation, they would still get some percentage correct. So, you might want to adjust your accuracy to reflect the baseline for random chance guessing. The concepts of expected and actual behavior apply to many different types of tasks and annotation scenarios.
The adjusted accuracy normalizes the annotator’s score so that the baseline from random guessing becomes 0. Let’s assume that someone was 90% accuracy overall. Their actual accuracy, adjusted for chance, is shown in Figure 8.5:
![]()
Figure 8.5: different ways of establishing a baseline expected from random guessing or “chance adjusted accuracy” when testing annotators against ground-truth data. Top, a visualization of how we normalize the result. If someone was randomly choosing a label, they would sometimes pick the correct one. So, we measure accuracy in terms of distance between the random accuracy and 1. The bottom shows how this might look with our example data. Note the normalized score of 60% accuracy for always guessing “Pedestrian” is very different from the 90% raw accuracy score or 86.7% when normalized according to the number of labels. This highlights why the correct baseline for “expected” accuracy is so important. There are different cases where each of the three baselines is typically the correct one, so it is important to know about all three.