Chapter 8. Strategies for fault tolerance and monitoring

This chapter covers

What is latency?
Why do microservices need to be fault tolerant?
How do circuit breakers work?
What tools can mitigate against distributed failure?

You’ll use the example from the previous chapters to expand the functionality of Stripe and Payment to include fault mitigation as you explore the concepts of fault tolerance and monitoring. Fault tolerance is especially important when your Payment microservice is communicating over a network to external systems. You need to expect failures and time-outs when communicating across networks.

8.1. Microservice failures in a distributed architecture

Figure 8.1 revisits what your distributed architecture for microservices looks like.

Figure 8.1. Microservices in a distributed architecture

How is this distributed architecture relevant to failures? By virtue of your microservices containing smaller chunks of business logic, as opposed to a monolith that contains everything, you end up with a significantly larger number of services to maintain. You’re no longer dealing with a UI that might communicate with a single backend service that handles all its needs. More likely, that same UI is now integrating with dozens of microservices, or more, that need to be just as reliable as your previous monolith.

Chapter 8. Strategies for fault tolerance and monitoring

This chapter covers

8.1. Microservice failures in a distributed architecture

Figure 8.1. Microservices in a distributed architecture

8.2. Network failures

8.3. Mitigating against failures

8.4. Adding Hystrix to your Payment microservice

Summary

Chapter 8. Strategies for fault tolerance and monitoring

This chapter covers

8.1. Microservice failures in a distributed architecture

Figure 8.1. Microservices in a distributed architecture

8.2. Network failures

8.3. Mitigating against failures

8.4. Adding Hystrix to your Payment microservice

Summary

Unable to load book!