6 Large Language Models in Production:A practical guide
This chapter covers
- How to structure an LLM service and tools to deploy
- How to create and prepare a Kubernetes cluster for LLM deployment
- Common production challenges and some methods to handle them
- Deploying models to the edge
We did it. We arrived. This is the chapter we wanted to write when we first thought about writing this book. I remember the first model I ever deployed. Words can’t describe how much more satisfaction this gave me than the dozens of projects left to rot on my laptop. In my mind it sits on a pedestal, not because it was good, in fact, it was quite terrible, but because it was useful and actually used by those who needed it the most. It made an impact on the lives of those around me.
So what actually is production? "Production" refers to the phase where the model is integrated into a live or operational environment where it can perform its intended tasks or provide services to end-users. It's a crucial phase in making the model available for real-world applications and services. To this extent, we will show you how to package up an LLM into a service or API so that it can take on-demand requests. We will then show you how to set up a cluster in the cloud where you can deploy this service, and then share some challenges you may face in production with some tips to handle them. Lastly, we will talk about a different kind of production, deploying models on edge devices.