chapter ten

10 Classifying Suspected Tumors

 

This chapter covers:

  • Using DataLoader s to load data from Dataset s with multiple child processes
  • Implementing an introductory model that performs step 3, classification
  • Setting up the basic skeleton for our training and testing application that loops per epoch, and feeds the model data from the training and testing Dataset s
  • Timing long-running loops with a custom enumeration function
  • Logging metrics that describe model performance and evaluating model performance using those metrics

We’re going to do two main things in this chapter. First, we’re going to take the Ct and LunaDataset classes we implemented last chapter, and use them to feed DataLoader instances, which will in turn feed our model with data via a training and testing loop. This model and loop will be the foundation that the rest of part 2 builds off of. Second, we’re going to use the output of that training loop to introduce the actual core challenge of part 2, which is how to get high quality results from messy, limited data. In later chapters we’ll be exploring the ways that our data is limited, as well as building mitigations for those limitations.

But before that, we must lay our foundation.

Figure 10.1. Training and testing loop, with an outer loop over each epoch.
p2ch10 mental model

The basic structure of what we’re going to implement will be:

10.1  Setting up the main training application

10.1.1  Initalizing the model and optimizer

10.1.2  Care and feeding of DataLoaders

10.2  Our first-pass neural network design

10.2.1  The Core Convolutions

10.3  Training and testing the model

10.3.1  The computeBatchLoss function

10.3.2  Deleting the loss variable

10.3.3  The testing loop is similar

10.4  The logMetrics function

10.5  Running the training script

10.5.1  Needed data for training

10.5.2  Interlude: the enumerateWithEstimate function

10.6  Getting 99.7% correct means we’re done, right?

10.6.1  Why is this happening?

10.7  Exercises