chapter ten

10 Model Inference and Serving

This chapter covers

Introducing BentoML for Model Serving
Building model servers with BentoML
Observability and Monitoring in BentoML
Packaging and Deploying BentoML Services
Using BentoML and MLFlow together
Using only MLFLow for model lifecycles
Alternatives to BentoML and MLFlow

Now that we have a working model training and validation pipeline, it's time to make the model available as a service. In this chapter, we will explore how to seamlessly serve your object detection model and movie recommendation model using BentoML, a powerful framework designed to build and serve machine learning models at scale.

You'll have the opportunity to deploy the two models you trained in the previous chapter, gaining hands-on experience with real-world deployment scenarios. We'll start by building and deploying the service locally, then progress to creating a container that encapsulates a service for deployment, integrating it seamlessly into your ML workflow.

Self-service model deployment offers several advantages for engineers developing MLOps:

10.1 Model Deployment is Hard

10.2 BentoML: Simplifying Model Deployment

10 Model Inference and Serving

This chapter covers

10.1 Model Deployment is Hard

10.2 BentoML: Simplifying Model Deployment

10.3 A Whirlwind Tour of BentoML

10.3.1 BentoML Service and Runners

10.4 Executing a BentoML Service Locally

10.4.1 Loading a Model with BentoML Runner

10.5 Building Bentos: Packaging Your Service for Deployment

10.5.1 BentoML and MLFlow inference

10.6 Using only MLFLow to create an inference service

10.7 KServe: An Alternative to BentoML

10.8 Summary