chapter eleven
                    11 Deployment and Serving
This chapter covers
- Small Language Model (SLM) deployment and inference
 - vLLM
 - FastAPI
 - MLC LLM
 - Android devices
 
This chapter deep dives into some of most probable target environments/tools for deploying and serving small, customized language models. The list here isn’t meant to be comprehensive, as the more you move local and/or to the edge, the more hardware options you encounter, and the more frameworks/libraries to pick up from: so, I have selected those that are currently the most popular across multiple operating systems and hardware combinations.