chapter eleven
11 Deployment and serving
This chapter covers
- SLM serving and inference with vLLM
- SLM serving with FastAPI
- SLM deployment and serving on devices with MLC LLM
- Options for SLM deployment and inference on Android devices
It’s time to look at some of the most common environments and tools for deploying and serving small, customized language models. We won’t cover them all: the closer you get to local or edge deployments, the more hardware options and frameworks you’ll encounter. I’ve focused on the options that are currently most popular across operating systems and hardware combinations.