chapter eleven

11 Deploying an LLM on a Raspberry PI: How low can you go?

This chapter covers

Setting up a Raspberry Pi server on your local network.
Converting and Quantizing a model to GGUF format.
Serving your model as a drop in replacement to OpenAI GPT model.
What to do next and how to make it better.

Welcome to one of our favorite projects on this list: serving an LLM on a device smaller than it should ever be served on. In this project, we will really be pushing to the edge of this technology. By following along, you’ll be able to really flex everything you’ve learned in this book. In this project, we’ll be deploying an LLM to a Raspberry Pi which we will set up as an LLM Service you can query from any device on your home network. For all the hackers out there, this should open the doors to many home projects. For everyone else, it’s a chance to solidify your understanding of the limitations of using LLMs and appreciating that the community has made this possible at all.

11.1 Setting Up Raspbian

11.1.1 Pi Imager

11.1.2 Connecting to Pi

11.1.3 Software Installations and Updates

11.2 Preparing the model

11.3 Serve the model

11.4 Improvements

11.4.1 Better Interface

11.4.2 Change Quantization

11.4.3 Adding Multimodality

11.4.4 Serve the model on Google Colab

11.5 Summary