6 Designing reliable services

This chapter covers

The impact of service availability on application reliability
Designing microservices that defend against faults in their dependencies
Applying retries, rate limits, circuit breakers, health checks, and caching to mitigate interservice communication issues
Applying safe communication standards across many services

No microservice is an island; each one plays a small part in a much larger system. Most services that you build will have other services that rely on them — upstream collaborators — and in turn themselves will depend on other services — downstream collaborators — to perform useful functions. For a service to reliably and consistently perform its job, it needs to be able to trust these collaborators.

6.1 Defining reliability

6.2 What could go wrong?

6.2.1 Sources of failure

6.2.2 Cascading failures

6.3 Designing reliable communication

6.3.1 Retries

6.3.2 Fallbacks

6.3.3 Timeouts

6.3.4 Circuit breakers

6.3.5 Asynchronous communication

6.4 Maximizing service reliability

6.4.1 Load balancing and service health

6.4.2 Rate limits

6.4.3 Validating reliability and fault tolerance

6.5 Safety by default

6.5.1 Frameworks

6.5.2 Service mesh

Summary