10 Model Inference and Serving

 

This chapter covers

  • Introducing BentoML for Model Serving
  • Building model servers with BentoML
  • Observability and Monitoring in BentoML
  • Packaging and Deploying BentoML Services
  • Using BentoML and MLFlow together
  • Using only MLFLow for model lifecycles
  • Alternatives to BentoML and MLFlow

Now that we have a working model training and validation pipeline, it's time to make the model available as a service. In this chapter, we will explore how to seamlessly serve your object detection model and movie recommendation model using BentoML, a powerful framework designed to build and serve machine learning models at scale.

You'll have the opportunity to deploy the two models you trained in the previous chapter, gaining hands-on experience with real-world deployment scenarios. We'll start by building and deploying the service locally, then progress to creating a container that encapsulates a service for deployment, integrating it seamlessly into your ML workflow.

Self-service model deployment offers several advantages for engineers developing MLOps:

10.1 Model Deployment is Hard

10.2 BentoML: Simplifying Model Deployment

10.3 A Whirlwind Tour of BentoML

10.3.1 BentoML Service and Runners

10.4 Executing a BentoML Service Locally

10.4.1 Loading a Model with BentoML Runner

10.5 Building Bentos: Packaging Your Service for Deployment

10.5.1 BentoML and MLFlow inference

10.6 Using only MLFLow to create an inference service

10.7 KServe: An Alternative to BentoML

10.8 Summary