chapter two

2 Getting started with human-in-the-loop machine learning

This chapter covers

Ranking predictions by model confidence to identify confusing items
Finding unlabeled items with novel information
Building a simple interface to annotate training data
Evaluating changes in model accuracy as you add more training data

For any machine learning task, you should start with a simple but functional system and build out more sophisticated components as you go. This guideline applies to most technology: ship the minimum viable product (MVP) and then iterate on that product. The feedback you get from what you ship first will tell you which pieces are the most important to build out next.

This chapter is dedicated to building your first human-in-the-loop machine learning MVP. We will build on this system as this book progresses, allowing you to learn about the different components that are needed to build more sophisticated data annotation interfaces, active learning algorithms, and evaluation strategies.

Sometimes, a simple system is enough. Suppose that you work at a media company, and your job is to tag news articles according to their topic. You already have topics such as sports, politics, and entertainment. Natural disasters have been in the news lately, and your boss has asked you to annotate the relevant past news articles as disaster-related to allow better search for this new tag. You don’t have months to build out an optimal system; you want to get an MVP out as quickly as possible.

2.1 Beyond hacktive learning: Your first active learning algorithm

2 Getting started with human-in-the-loop machine learning

This chapter covers

2.1 Beyond hacktive learning: Your first active learning algorithm

2.2 The architecture of your first system

2.3 Interpreting model predictions and data to support active learning

2.3.1 Confidence ranking

2.3.2 Identifying outliers

2.3.3 What to expect as you iterate

2.4 Building an interface to get human labels

2.4.1 A simple interface for labeling text

2.4.2 Managing machine learning data