chapter six

6 Productionizing ML models

This chapter covers

Deploying ML models as a service using the BentoML deployment manager
Tracking data drift using Evidently

After orchestrating ML pipelines in chapter 5, we now face two critical challenges in the ML life cycle: deployment and monitoring. How do we reliably serve our models in production, and how do we ensure that they continue performing well over time?

Now we’ll tackle the core challenges of deploying ML models into production—challenges that apply broadly across both traditional ML models and large language models (LLMs). We explore how to efficiently serve models as APIs using BentoML, a powerful platform that simplifies deployment and abstracts away much of the underlying infrastructure complexity (figure 6.1). By automating key parts of the deployment workflow, BentoML allows ML engineers to focus on serving logic rather than setup details.

Figure 6.1 The mental map where we’re now focusing on deploying the model as an API (E)

A screenshot of a computer

AI-generated content may be incorrect.

We also address a common yet critical challenge in production ML systems: data drift. As real-world data distributions evolve, model performance can deteriorate. To address this, we introduce Evidently, a monitoring tool designed to detect and analyze data drift, enabling proactive maintenance of model performance in production environments.

6 Productionizing ML models

This chapter covers

Figure 6.1 The mental map where we’re now focusing on deploying the model as an API (E)

6.1 BentoML as a deployment platform

6.1.1 Building a Bento

6.1.2 Building and pushing the Bento

6.1.3 Deploying a Bento

6.2 Evidently for data drift monitoring

6.2.1 Data drift detection report and dashboard

6.2.2 Data drift detection Kubeflow pipeline component

6.2.3 Data drift detection for a model deployed as an API

Summary