7 When bad things happen: Resiliency patterns with Spring Cloud and Netflix Hystrix
This chapter covers
- Implementing circuit breakers, fallbacks and bulkheads
- Using the circuit breaker pattern to conserve microservice client resources
- Using Hystrix when a remote service is failing
- Implementing Hystrix’s bulkhead pattern to segregate remote resource calls
- Tuning Hystrix’s circuit breaker and bulkhead implementations
- Customizing Hystrix’s concurrency strategy
All systems, mainly distributed systems, will experience failure. How we build our applications to respond to that failure is a critical part of every software developer’s job. However, when it comes to building resilient systems, most software engineers only take into account the complete failure of a piece of infrastructure or critical service. They focus on building redundancy into each layer of their application using techniques such as clustering key servers, load balancing between services, and segregation of infrastructure into multiple locations.
While these approaches take into account the complete (and often spectacular) loss of a system component, they address only one small part of building resilient systems. When a service crashes, it’s easy to detect that it’s no longer there, and the application can route around it. However, when a service is running slow, detecting that poor performance and routing around it is extremely difficult because