8 Quality Control for Data Annotation
This chapter covers
- Calculating the accuracy of an annotator compared to ground-truth data.
- Calculating the overall agreement and reliability of a dataset as a whole.
- Measuring inter-annotator agreement on a per-task basis to generate a confidence score for each training data label.
- Designing Architectures that incorporate subject matter experts into the annotation workflow.
- Breaking up a task into simpler subtasks to improve accuracy, efficiency and quality control.
You have your Machine Learning model ready to go and you have got people lined up to annotate your data, so you are almost ready to deploy! But you know that your model is only going to be as accurate as the data that it is trained on, so if you can’t get high quality annotations then you won’t have an accurate model. You just need to give the same task to multiple people and take the majority vote, right?