4 Diversity sampling

 

This chapter covers

  • Using outlier detection to sample data that is unknown to your current model
  • Using clustering to sample more diverse data before annotation starts
  • Using representative sampling to target data most like where your model is deployed
  • Improving real-world diversity with stratified sampling and active learning
  • Using diversity sampling with different types of machine learning architectures
  • Evaluating the success of diversity sampling

In chapter 3, you learned how to identify where your model is uncertain: what your model “knows it doesn’t know.” In this chapter, you will learn how to identify what’s missing from your model: what your model “doesn’t know that it doesn’t know” or the “unknown unknowns.” This problem is a hard one, made even harder because what your model needs to know is often a moving target in a constantly changing world. Just like humans are learning new words, new objects, and new behaviors every day in response to a changing environment, most machine learning algorithms are deployed in a changing environment.

4.1 Knowing what you don’t know: Identifying gaps in your model’s knowledge

 
 
 
 

4.1.1 Example data for diversity sampling

 
 
 

4.1.2 Interpreting neural models for diversity sampling

 
 

4.1.3 Getting information from hidden layers in PyTorch

 
 
 

4.2 Model-based outlier sampling

 
 
 

4.2.1 Use validation data to rank activations

 
 
 

4.2.2 Which layers should I use to calculate model-based outliers?

 

4.2.3 The limitations of model-based outliers

 
 
 

4.3 Cluster-based sampling

 
 
 
 

4.3.1 Cluster members, centroids, and outliers

 
 
 

4.3.2 Any clustering algorithm in the universe

 
 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage