6 Designing reliable services


This chapter covers

  • The impact of service availability on application reliability
  • Designing microservices that defend against faults in their dependencies
  • Applying retries, rate limits, circuit breakers, health checks, and caching to mitigate interservice communication issues
  • Applying safe communication standards across many services

No microservice is an island; each one plays a small part in a much larger system. Most services that you build will have other services that rely on them — upstream collaborators — and in turn themselves will depend on other services — downstream collaborators — to perform useful functions. For a service to reliably and consistently perform its job, it needs to be able to trust these collaborators.

6.1 Defining reliability

6.2 What could go wrong?

6.2.1 Sources of failure

6.2.2 Cascading failures

6.3 Designing reliable communication

6.3.1 Retries

6.3.2 Fallbacks

6.3.3 Timeouts

6.3.4 Circuit breakers

6.3.5 Asynchronous communication

6.4 Maximizing service reliability

6.4.1 Load balancing and service health

6.4.2 Rate limits

6.4.3 Validating reliability and fault tolerance

6.5 Safety by default

6.5.1 Frameworks

6.5.2 Service mesh