10 Deployment and Serving
This chapter covers
- Small Language Model (SLM) deployment and inference
- vLLM
- FastAPI
- MLC LLM
- Android devices
This chapter deep dives into some of most probable target environments/tools for deploying and serving small, customized language models. The list here isn’t meant to be comprehensive, as the more you move local and/or to the edge, the more hardware options you encounter, and the more frameworks/libraries to pick up from: so, I have selected those that are currently the most popular across multiple operating systems and hardware combinations.