Part 3 Annotation

 

Annotation puts the human in human-in-the-loop machine learning. Creating datasets with accurate and representative labels for machine learning is often the most underestimated component of a machine learning application.

Chapter 7 covers how to find and manage the right people to annotate data. Chapter 8 covers the basics of quality control for annotation, introducing the most common ways to calculate the overall accuracy and agreement for an entire dataset and between annotators, labels, and on a per-task basis. Unlike with machine learning accuracy, we typically need to adjust for random chance accuracy and agreement for human annotators, which means that the evaluation metrics are more complicated when evaluating human performance.

Chapter 9 covers advanced strategies for annotation quality control, starting with techniques to elicit subjective annotations and then expanding to machine learning models for quality control. The chapter also covers a wide range of methods to semi-automate annotation with rule-based systems, search-based systems, transfer learning, semi-supervised learning, self-supervised learning, and synthetic data creation. These methods are among the most exciting research areas on the machine learning side of human–computer interaction today.