This chapter covers:
- The importance of resiliency
- Client-side load balancing
- Retries / Budgets / Timeouts
- Circuit breaking and bulkheads
- Advice for migration from application libraries used for resilience
Once we have traffic coming into our cluster through the Istio ingress gateway (covered in Chapter 4) we can manipulate the traffic at the request level and control exactly what versions or "subsets" of a service to which we want certain requests to go. In the previous chapter, we covered this traffic control for weighted routing, request-match based routing, and certain types of release patterns that can be enabled with that. We can also use this traffic control to route around problems in the event of application errors, network partitions, and other major issues.
The problem with distributed systems is that they often fail in unpredictable ways and we will not be able to manually take traffic-shifting actions. What we need is a way to build sensible behaviors into the application so they can respond on their own when they encounter problems. We can do that with Istio including adding timeouts, retries and circuit breaking, without having to alter application code. In this chapter we’ll take a look at how to do this and the implications on the rest of the system.