chapter six

6 The universal workflow of machine learning

 

This chapter covers

  • Framing a machine learning problem
  • Developing a working model
  • Deploying your model in production and maintaining it

Our previous examples have assumed that we already had a labeled dataset to start from and that we could immediately start training a model. In the real world, this is often not the case. You don’t start from a dataset; you start from a problem.

Imagine that you’re launching your own machine learning consulting shop. You incorporate, you put up a fancy website, you notify your network. The projects start rolling in:

  • A personalized photo search engine for a picture-sharing social network: type in “wedding” and retrieve all the pictures you took at weddings, without any manual tagging needed.
  • Flagging spam and offensive text content among the posts of a budding chat app.
  • Building a music recommendation system for users of an online radio.
  • Detecting credit card fraud for an e-commerce website.
  • Predicting display ad click-through rate to decide which ad to serve to a given user at a given time.
  • Flagging anomalous cookies on the conveyor belt of a cookie-manufacturing line.
  • Using satellite images to predict the location of as-yet-unknown archaeological sites.

It would be very convenient if you could import the correct dataset from keras3::dataset_mydataset() and start fitting some deep learning models. Unfortunately, in the real world, you’ll have to start from scratch.

6.1 Defining the task

6.1.1 Framing the problem

6.1.2 Collecting a dataset

6.1.3 Understanding your data

6.1.4 Choosing a measure of success

6.2 Developing a model

6.2.1 Preparing the data

6.2.2 Choosing an evaluation protocol

6.2.3 Beating a baseline

6.2.4 Scaling up: Developing a model that overfits

6.2.5 Regularizing and tuning your model

6.3 Deploying your model

6.3.1 Explaining your work to stakeholders and setting expectations

6.3.2 Shipping an inference model