1 Introduction to human-in-the-loop machine learning


This chapter covers

  • Annotating unlabeled data to create training, validation, and evaluation data
  • Sampling the most important unlabeled data items (active learning)
  • Incorporating human–computer interaction principles into annotation
  • Implementing transfer learning to take advantage of information in existing models

Unlike robots in the movies, most of today’s artificial intelligence (AI) cannot learn by itself; instead, it relies on intensive human feedback. Probably 90% of machine learning applications today are powered by supervised machine learning. This figure covers a wide range of use cases. An autonomous vehicle can drive you safely down the street because humans have spent thousands of hours telling it when its sensors are seeing a pedestrian, moving vehicle, lane marking, or other relevant object. Your in-home device knows what to do when you say “Turn up the volume” because humans have spent thousands of hours telling it how to interpret different commands. And your machine translation service can translate between languages because it has been trained on thousands (or maybe millions) of human-translated texts.

1.1 The basic principles of human-in-the-loop machine learning

1.2 Introducing annotation

1.2.1 Simple and more complicated annotation strategies

1.2.2 Plugging the gap in data science knowledge

1.2.3 Quality human annotation: Why is it hard?

1.3 Introducing active learning: Improving the speed and reducing the cost of training data

1.3.1 Three broad active learning sampling strategies: Uncertainty, diversity, and random

1.3.2 What is a random selection of evaluation data?

1.3.3 When to use active learning

1.4 Machine learning and human–computer interaction

1.4.1 User interfaces: How do you create training data?

1.4.2 Priming: What can influence human perception?

1.4.3 The pros and cons of creating labels by evaluating machine learning predictions

1.4.4 Basic principles for designing annotation interfaces