10 Annotation Quality for Different Machine Learning Tasks

This chapter covers:

Adapting annotation quality control methods from labeling to continuous tasks
Managing annotation quality for Computer Vision tasks like object detection and semantic segmentation.
Managing annotation quality for Natural Language Processing tasks like Sequence Labeling and Text Generation.
Understanding annotation quality for other Machine Learning tasks in Speech, Video and Information Retrieval.

Most machine learning tasks are more complicated than labeling an entire image or document. Imagine that you need to generate subtitles for movies in a creative way. Creating transcriptions of spoken and signed language is a language generation task. If you wanted to emphasize angry language with bold text, then this is an additional sequence labeling task. Imagine also that you want to display the transcriptions like the “speech bubbles” of text found in comics. You could use object detection to make sure that the speech bubble comes from the right person and use semantic segmentation to ensure that the speech bubble is placed over the background of the scene instead of people or important objects. You might also want to predict what a given person might rate the film as part of a recommendation system or feed the content into a search engine that can find matches for abstract phrases like “motivational speeches”.

10.1 Annotation Quality for Continuous Tasks

10.1.1 Ground-truth for Continuous Tasks

10.1.2 Agreement for Continuous Tasks

10.1.3 Subjectivity in continuous tasks

10.1.4 Aggregating continuous judgements to create training data

10.1.5 Machine learning for aggregating continuous tasks to create training data

10.2 Annotation Quality for Object Detection

10.2.1 Ground-truth for Object Detection

10.2.2 Agreement for Object Detection

10.2.3 Dimensionality and Accuracy in Object Detection

10.2.4 Subjectivity for Object Detection

10.2.5 Aggregating object annotations to create training data

10.2.6 Machine learning for object annotations

10.3 Annotation Quality for Semantic Segmentation

10.3.1 Ground-truth for semantic segmentation annotation

10.3.2 Agreement for Semantic Segmentation

10.3.3 Subjectivity for Semantic Segmentation annotations

10.3.4 Aggregating Semantic Segmentation to create training data

10.4 Annotation Quality for Sequence Labeling