chapter nine

9 Advanced Data Annotation and Augmentation

 

This chapter covers:

  • Evaluating annotation quality for subjective tasks.
  • Optimizing annotation quality control with machine learning.
  • Treating model predictions as annotations.
  • Combining embeddings/contextual representations with annotation.
  • Using Search and rule-based systems for data annotation.
  • Bootstrapping models and supporting exploratory data analysis with Lightly-supervised Machine Learning.
  • Expanding datasets with synthetic data, data creation, and data augmentation.
  • Incorporating annotation information into machine learning models.

9.1            Annotation Quality for Subjective Tasks

9.1.1                     Requesting annotator expectations

9.1.2                     Assessing viable labels for subjective tasks

9.1.3                     Trusting an annotator to understand the diversity of possible responses

9.1.4                     Bayesian Truth Serum for subjective judgments

9.1.5                     Embedding simple tasks in more complicated ones

9.2            Machine Learning for annotation quality control

9.2.1                     Calculating annotation confidence as an optimization task

9.2.2                     Converging on label confidence when annotators disagree

9.2.3                     Predicting whether a single annotation is correct or incorrect

9.3            Model predictions as annotations

9.3.1                     Trusting annotations from confident model predictions

9.3.2                     Treating model predictions as a single annotator

9.3.3                     Cross-validating to find mislabeled data

9.4            Embeddings/Contextual Representations

9.4.1                     Transfer learning from an existing model

9.6.2   Human-guided exploratory data analysis