chapter six

6 Resilience: Solving application networking challenges

This chapter covers

Understanding the importance of resilience
Leveraging client-side load balancing
Implementing request timeouts and retries
Circuit breaking and connection pooling
Migrating from application libraries used for resilience

Once we have traffic coming into our cluster through the Istio ingress gateway (covered in chapter 4), we can manipulate the traffic at the request level and control exactly where to route the request. In the previous chapter, we covered traffic control for weighted routing, request-match-based routing, and certain types of release patterns that can then be enabled. We can also use this traffic control to route around problems in the event of application errors, network partitions, and other major issues.

The problem with distributed systems is that they often fail in unpredictable ways, and we cannot manually take traffic-shifting actions. We need a way to build sensible behaviors into the application so they can respond on their own when they encounter problems. We can do that with Istio, including adding timeouts, retries, and circuit breaking, without having to alter application code. In this chapter, we look at how to do this and the implications for the rest of the system.

6.1 Building resilience into the application

6.1.1 Building resilience into application libraries

6.1.2 Using Istio to solve these problems

6.1.3 Decentralized implementation of resilience

6.2 Client-side load balancing

6.2.1 Getting started with client-side load balancing

6.2.2 Setting up our scenario

6.2.3 Testing various client-side load-balancing strategies

6.2.4 Understanding the different load-balancing algorithms

6.3 Locality-aware load balancing

6.3.1 Hands-on with locality load balancing

6.3.2 More control over locality load balancing with weighted distribution

6.4 Transparent timeouts and retries

6.4.1 Timeouts

6.4.2 Retries