14 Writing production code

 

This chapter covers

  • Validating feature data before attempting to use it for a model
  • Monitoring features in production
  • Monitoring all aspects of a production model life cycle
  • Approaching projects with the goal of solving them in the simplest manner possible
  • Defining a standard code architecture for ML projects
  • Avoiding cargo cult behavior in ML

We spent the entirety of part 2 of this book on the more technician-focused aspects of building ML software. In this chapter, we’ll begin the journey of looking at ML project work from the eyes of an architect.

We’ll focus on the theory and philosophy of approaches to solving problems with ML from the highly interconnected, intensely complex, and altogether holistic view of how our profession functions. We’ll look at case studies of production ML (all based, in one way or another, on things that I’ve messed up or have seen others mess up) to give an insight into elements of ML development that aren’t frequently talked about. These are the lessons learned (usually the hard way) when we, as a profession, are more focused on the algorithmic aspects of solving problems, rather than where we should be focused:

  • The data—How it’s generated, where it is, and what it fundamentally is
  • The complexity—Of the solution and of the code
  • The problem—How to solve it in the easiest way possible

14.1 Have you met your data?

14.1.1 Make sure you have the data

14.1.2 Check your data provenance

14.1.3 Find a source of truth and align on it

14.1.4 Don’t embed data cleansing into your production code

14.2 Monitoring your features

14.3 Monitoring everything else in the model life cycle

14.4 Keeping things as simple as possible

14.4.1 Simplicity in problem definitions

14.4.2 Simplicity in implementation