chapter nine

9 Troubleshooting the data plane

 

This chapter covers:

  • Understanding how to troubleshoot a misconfigured workload
  • How to detect and prevent misconfigurations using Istioctl
  • Explains how to use Istioctl to investigate service proxy configuration
  • Explains how to make sense of Envoy logs to understand service proxy behavior
  • Gain insights into your apps using the collected telemetry

Debugging the service mesh can be a daunting task as many components participate to serve a request and issues can occur in any of those. This is not a plight brought onto us by Istio, it’s the nature of distributed systems.

Figure 9.1. Components that participate to route a request
1.routing a request

As seen in the figure above the components that participate to serve a request are:

  1. Istio Pilot which ensures the data plane is synchronized to the desired state
  2. The Ingress Gateway that admits traffic into the cluster
  3. The service proxy that provides access control and handles traffic from the downstream to the local application
  4. The application itself, that serves the request. The application might request another service which continues the chain to
  5. Another upstream service…

Thus when facing failures it can be related to any of the components in this chain. Debugging every component could take a lot of time, which we don’t have when apps are impacted in the entire cluster.

9.1 The most common mistake: A misconfigured Data Plane

9.2 Identifying data plane issues

9.2.1 How to verify that the data plane is up to date

9.2.2 Discover misconfiguration with Kiali

9.2.3 Discover misconfiguration with Istioctl

9.3 Discover misconfiguration manually from the Envoy config

9.3.1 Envoy Administration Dashboard

9.3.2 Querying proxy configuration using Istioctl

9.3.3 Troubleshooting Application Issues

9.3.4 Inspect network traffic with ksniff

9.4 Understand your application using Envoy Telemetry

9.4.1 Finding the rate of failing requests in Grafana

9.4.2 Querying the affected pods using Prometheus

9.5 Summary