8 Design a rate-limiting service

 

This chapter covers

  • Using rate limiting
  • Discussing a rate-limiting service
  • Understanding various rate-limiting algorithms

Rate limiting is a common service that we should almost always mention during a system design interview and is mentioned in most of the example questions in this book. This chapter aims to address situations where 1) the interviewer may ask for more details when we mention rate limiting during an interview, and 2) the question itself is to design a rate-limiting service.

Rate limiting defines the rate at which consumers can make requests to API endpoints. Rate limiting prevents inadvertent or malicious overuse by clients, especially bots. In this chapter, we refer to such clients as “excessive clients”.

Examples of inadvertent overuse include the following:

  • Our client is another web service that experienced a (legitimate or malicious) traffic spike.
  • The developers of that service decided to run a load test on their production environment.

Such inadvertent overuse causes a “noisy neighbor” problem, where a client utilizes too much resource on our service, so our other clients will experience higher latency or higher rate of failed requests.

Malicious attacks include the following. There are other bot attacks that rate limiting does not prevent(see https://www.cloudflare.com/learning/bots/what-is-bot-management/ for more information)

8.1 Alternatives to a rate-limiting service and why they are infeasible

8.2 When not to do rate limiting

8.3 Functional requirements

8.4 Non-functional requirements

8.4.1 Scalability

8.4.2 Performance

8.4.3 Complexity

8.4.4 Security and privacy

8.4.5 Availability and fault-tolerance

8.4.6 Accuracy

8.4.7 Consistency

8.5 Discuss user stories and required service components

8.6 High-level architecture

8.7 Stateful approach/sharding