chapter ten

10 Troubleshooting the data plane

 

This chapter covers

  • Understanding how to troubleshoot a misconfigured workload
  • How to detect and prevent misconfigurations using istioctl
  • Explains how to use istioctl to investigate service proxy configuration
  • Explains how to make sense of Envoy logs to understand service proxy behavior
  • Gain insights into your apps using the collected telemetry

When communicating over the networking, many things can go wrong as we’ve demonstrated throughout this book. A big reason why Istio exists is to help shine a light on network communication when things go wrong as well as put in place remediation capabilities like timeouts, retries, and circuit breaking so that applications can respond to network issues automatically. The service proxy that gets deployed alongside each workload can do a lot to give us a very detailed view of what’s happening on the network, but what happens with the proxy itself behaves unexpectedly?

Figure 10.1. Components that participate to route a request
1.routing a request

As seen in the figure above the components that participate to serve a request are:

  1. Istiod which ensures the data plane is synchronized to the desired state
  2. The Ingress Gateway that admits traffic into the cluster
  3. The service proxy that provides access control and handles traffic from the downstream to the local application
  4. The application itself, that serves the request. The application might request another service which continues the chain to
  5. Another upstream service…

10.1 The most common mistake: A misconfigured Data Plane

10.2 Identifying data plane issues

10.2.1 How to verify that the data plane is up to date

10.2.2 Discover misconfiguration with Kiali

10.2.3 Discover misconfiguration with istioctl

10.3 Discover misconfiguration manually from the Envoy config

10.3.1 Envoy Administration Dashboard

10.3.2 Querying proxy configuration using istioctl

10.3.3 Troubleshooting Application Issues

10.3.4 Inspect network traffic with ksniff

10.4 Understand your application using Envoy Telemetry

10.4.1 Finding the rate of failing requests in Grafana

10.4.2 Querying the affected pods using Prometheus

10.5 Summary