5 Hosting, scaling, and load testing

 

This chapter covers

  • Choosing a way to deploy your app
  • Containerizing your app
  • Wiring your Azure Web App to GitHub for automatic builds & releases
  • Scaling up to handle many queries
  • Using load testing tools like Locust to ensure your RAG app doesn’t break under pressure

Up to now our RAG chatbot has lived a sheltered life: a single process on one laptop, an in-memory SQLite file, zero real users. That’s perfect for experimentation, but production traffic is a far less forgiving audience. Conversations arrive in bursts, browser tabs multiply, and sooner or later someone in finance asks why chatbot takes 5 minutes to answer. This chapter is the bridge between “it works on my machine” and “it survives a stampede.”

We’ll start by talking about statelessness, the north-star principle behind modern deployment. If any copy of our service should be able to handle any request, then local disks are off-limits for durable storage, configuration must travel through environment variables, and startup needs to be fast enough that a cluster can kill and replace instances at will. That philosophy naturally points us toward containers, because a container captures every library, build step, and port exposure in a single artifact that can boot identically on a developer laptop, an Azure Web App Service, or a Kubernetes node running on a computer in your garage.

5.1 Packaging and containers

5.1.1 Removing state from the container

5.1.2 Creating a CosmosDB database for chat logs

5.1.3 Modifying chapter 4 code to hit the new CosmosDB database

5.1.4 Returning the final answer for integration tests

5.1.5 Getting ready to containerize

5.2 Deploying

5.2.1 Creating a container registry

5.2.2 Creating a web app

5.2.3 Setting up the web app

5.2.4 Setting up connections with Github

5.3 Trying out your deployed app

5.4 Breaking things, on purpose: load testing with Locust

5.4.1 Adding logging

5.4.2 Load testing using Locust

5.4.3 Fixing the things that Locust breaks

5.5 Summary