Chapter 8. Strategies for fault tolerance and monitoring
This chapter covers
- What is latency?
- Why do microservices need to be fault tolerant?
- How do circuit breakers work?
- What tools can mitigate against distributed failure?
You’ll use the example from the previous chapters to expand the functionality of Stripe and Payment to include fault mitigation as you explore the concepts of fault tolerance and monitoring. Fault tolerance is especially important when your Payment microservice is communicating over a network to external systems. You need to expect failures and time-outs when communicating across networks.
Figure 8.1 revisits what your distributed architecture for microservices looks like.
How is this distributed architecture relevant to failures? By virtue of your microservices containing smaller chunks of business logic, as opposed to a monolith that contains everything, you end up with a significantly larger number of services to maintain. You’re no longer dealing with a UI that might communicate with a single backend service that handles all its needs. More likely, that same UI is now integrating with dozens of microservices, or more, that need to be just as reliable as your previous monolith.