Chapter 9. Interaction redundancy: Retries and other control loops


This chapter covers

  • Retries: repeating access attempts on timeouts
  • Retry storms
  • Safe and idempotent services
  • Fallbacks
  • Control loops

While surfing the web, what do you do when a web page you’re trying to access fails to load? You hit the refresh button, right? I’ve talked a lot about redundant service instances, but now want to turn to another place where redundancy is used in cloud-native software: when making requests. Just as depending on a single instance of an app to always be up is untenable, so too is depending on each and every request to never experience any trouble. Instead, your software will repeat requests, just as you do. Well, maybe not just as. Let’s explore this a bit.

The case that I started with is the simplest: you’re loading a page to read it. For example, you might be looking at the Hacker News homepage (, the headline “Monks Who Play Punk (2007)” catches your fancy, and you click the link to read the full article. The article doesn’t load, or it only partially loads, so you hit the refresh button and all is fine.

9.1. Request retries

9.2. Fallback logic

9.3. Control loops