6 The universal workflow of machine learning

This chapter covers

Steps for framing a machine learning problem
Steps for developing a working model
Steps for deploying your model in production and maintaining it

Our previous examples have assumed that we already had a labeled dataset to start from, and that we could immediately start training a model. In the real world, this is often not the case. You don’t start from a dataset; you start from a problem.

Imagine that you’re starting your own machine learning consulting shop. You incorporate, you put up a fancy website, you notify your network. The following projects start rolling in:

A personalized photo search engine for a picture-sharing social network—type in “wedding” and retrieve all the pictures you took at weddings, without any manual tagging needed.
Flagging spam and offensive text content among the posts of a budding chat app.
Building a music recommendation system for users of an online radio station.
Detecting credit card fraud for an e-commerce website.
Predicting display ad click-through rates to decide which ad to serve to a given user at a given time.
Flagging anomalous cookies on the conveyor belt of a cookie-manufacturing line.
Using satellite images to predict the location of as-yet unknown archeological sites.

6.1 Define the task

6.1.1 Frame the problem

6.1.2 Collect a dataset

6.1.3 Understand your data

6.1.4 Choose a measure of success

6.2 Develop a model

6.2.1 Prepare the data

6.2.2 Choose an evaluation protocol

6.2.3 Beat a baseline

6.2.4 Scale up: Develop a model that overfits

6.2.5 Regularize and tune your model

6.3 Deploy the model

6.3.1 Explain your work to stakeholders and set expectations

6.3.2 Ship an inference model