This chapter covers
- Understanding the factors of control plane performance
- How to monitor performance
- What are the key performance metrics
- Understanding how to optimize the performance
In the previous chapter on troubleshooting the data plane, we took a deep dive into the debugging tools available for us to diagnose issues with proxy configuration and proxy behavior. Understanding service proxy configuration simplifies troubleshooting when some behaviors do not match our expectations.
In this chapter, we focus on optimizing the control plane performance. We’ll investigate how the control plane configures the service proxies, what are the factors that slow down this process, how to monitor it, and what are the knobs that we can turn to improve its performance.
In previous sections, we’ve said that the control plane is the brains of the service mesh and that it exposes an API for service-mesh operators. This API can be used to manipulate the behavior of the mesh and configure the service proxies deployed alongside each workload instance. What we omitted for brevity is that we, service mesh operators, making requests to this API is not the only way behavior and configuration of the mesh gets affected. More generally, the control plane abstracts away details of the runtime environment such as what services exist (service discovery), which services are healthy, autoscaling events, et. al.