10 Model Inference and Serving
This chapter covers
- Introducing BentoML for Model Serving
 - Building model servers with BentoML
 - Observability and Monitoring in BentoML
 - Packaging and Deploying BentoML Services
 - Using BentoML and MLFlow together
 - Using only MLFLow for model lifecycles
 - Alternatives to BentoML and MLFlow
 
Now that we have a working model training and validation pipeline, it's time to make the model available as a service. In this chapter, we will explore how to seamlessly serve your object detection model and movie recommendation model using BentoML, a powerful framework designed to build and serve machine learning models at scale.
You'll have the opportunity to deploy the two models you trained in the previous chapter, gaining hands-on experience with real-world deployment scenarios. We'll start by building and deploying the service locally, then progress to creating a container that encapsulates a service for deployment, integrating it seamlessly into your ML workflow.
Self-service model deployment offers several advantages for engineers developing MLOps: