10 Model Inference and Serving
This chapter covers
- Introducing BentoML for Model Serving
- Building model servers with BentoML
- Observability and Monitoring in BentoML
- Packaging and Deploying BentoML Services
- Using BentoML and MLFlow together
- Using only MLFLow for model lifecycles
- Alternatives to BentoML and MLFlow
Now that we have a working model training and validation pipeline, it's time to make the model available as a service. In this chapter, we will explore how to seamlessly serve your object detection model and movie recommendation model using BentoML, a powerful framework designed to build and serve machine learning models at scale.
You'll have the opportunity to deploy the two models you trained in the previous chapter, gaining hands-on experience with real-world deployment scenarios. We'll start by building and deploying the service locally, then progress to creating a container that encapsulates a service for deployment, integrating it seamlessly into your ML workflow.
Self-service model deployment offers several advantages for engineers developing MLOps: