chapter six

6 Productionizing ML models

 

This chapter covers

  • Deploy ML models as a service using the BentoML deployment manager
  • Track data drift using Evidently

After orchestrating ML pipelines in Chapter 5, we now face two critical challenges in the ML lifecycle: deployment and monitoring. How do we reliably serve our models in production, and how do we ensure they continue performing well over time?

Now we’ll tackle the core challenges of deploying machine learning models into production—challenges that apply broadly across both traditional machine learning models and large language models (LLMs). We explore how to efficiently serve models as APIs using BentoML, a powerful platform that simplifies deployment and abstracts away much of the underlying infrastructure complexity (Figure 6.1). By automating key parts of the deployment workflow, BentoML allows ML engineers to focus on serving logic rather than setup details.

Figure 6.1 The mental map where we are now focusing on deployment of Model as an API(E)

We also address a common yet critical issue in production ML systems: data drift. As real-world data distributions evolve, model performance can deteriorate. To address this, we introduce Evidently, a monitoring tool designed to detect and analyze data drift, enabling proactive maintenance of model performance in production environments.

6.1 Bento ML as a Deployment Platform

6.1.1 Building A Bento

6.1.2 Deploying a Bento

6.2 Evidently For Data Drift Monitoring

6.2.1 6.2.1 Data Drift Detection Report And Dashboard

6.2.2 Data drift detection Kubeflow pipeline component

6.2.3 Data drift detection for model deployed as API

6.3 Summary