chapter ten

10 Model inference & serving

 

This chapter covers

  • BentoML for Model Serving and building model servers
  • Observability and Monitoring in BentoML
  • Packaging and Deploying BentoML Services
  • Using BentoML and MLFlow together
  • Using only MLFLow for model lifecycles
  • Alternatives to BentoML and MLFlow

Now that we have a working model training and validation pipeline, it's time to make the model available as a service. In this chapter, we will explore how to seamlessly serve your object detection model and movie recommendation model using BentoML, a powerful framework designed to build and serve machine learning models at scale.

You'll have the opportunity to deploy the two models you trained in the previous two chapters, gaining hands-on experience with real-world deployment scenarios. We'll start by building and deploying the service locally, then progress to creating a container that encapsulates a service for deployment, integrating it seamlessly into your ML workflow (Figure 10.1).

Figure 10.1 The mental map where we are now focusing on model deployment(6) and making the model available as an API(E)

Self-service model deployment offers several advantages for engineers developing MLOps:

10.1 Model deployment is hard

10.2 BentoML: Simplifying model deployment

10.3 A whirlwind tour of BentoML

10.3.1 BentoML service and runners

10.4 Executing a BentoML service locally

10.4.1 Loading a model with BentoML Runner

10.5 Building Bentos: Packaging your service for deployment

10.5.1 Bento Tags: Versioning and managing your Bentos

10.5.2 BentoML and MLFlow inference

10.6 Using only MLFLow to create an inference service

10.7 KServe: An alternative to BentoML

10.8 Summary