6 The universal workflow of machine learning

 

This chapter covers

  • Steps for framing a machine learning problem
  • Steps for developing a working model
  • Steps for deploying your model in production and maintaining it

Our previous examples have assumed that we already had a labeled dataset to start from, and that we could immediately start training a model. In the real world, this is often not the case. You don’t start from a dataset, you start from a problem.

Imagine that you’re starting your own machine learning consulting shop. You incorporate, you put up a fancy website, you notify your network. The projects start rolling in:

  • A personalized photo search engine for a picture-sharing social network—type in “wedding” and retrieve all the pictures you took at weddings, without any manual tagging needed.
  • Flagging spam and offensive text content among the posts of a budding chat app.
  • Building a music recommendation system for users of an online radio.
  • Detecting credit card fraud for an e-commerce website.
  • Predicting display ad click-through rate to decide which ad to serve to a given user at a given time.
  • Flagging anomalous cookies on the conveyor belt of a cookie-manufacturing line.
  • Using satellite images to predict the location of as-yet unknown archeological sites.

6.1 Define the task

6.1.1 Frame the problem

6.1.2 Collect a dataset

6.1.3 Understand your data

6.1.4 Choose a measure of success

6.2 Develop a model

6.2.1 Prepare the data

6.2.2 Choose an evaluation protocol

6.2.3 Beat a baseline