19 Controlling workload placement and automatic scaling

 

Kubernetes decides where to run your workloads, spreading them around the cluster to get the best use of your servers and the highest availability for your apps. Deciding which node is going to run a new Pod is the job of the scheduler, one of the control plane components. The scheduler uses all the information it can get to choose a node. It looks at the compute capacity of the server and the resources used by existing Pods. It also uses policies that you can hook into in your application specs to have more control over where Pods will run. In this chapter, you’ll learn how to direct Pods to specific nodes and how to schedule the placement of Pods in relation to other Pods.

We’ll also cover two other sides of workload placement in this chapter: automatic scaling and Pod eviction. Autoscaling lets you specify the minimum and maximum number of replicas for your app, along with some metric for Kubernetes to measure how hard your app is working. If the Pods are overworked, the cluster scales up automatically, adding more replicas, and scales down again when the load reduces. Eviction is the extreme scenario where nodes are maxing out resources, and Kubernetes removes Pods to keep the server stable. We’ll cover some intricate details, but it’s important to understand the principles to get the right balance of a healthy cluster and high-performing apps.

19.1 How Kubernetes schedules workloads

19.2 Directing Pod placement with affinity and antiaffinity

19.3 Controlling capacity with automatic scaling

19.4 Protecting resources with preemption and priorities

19.5 Understanding the controls for managing workloads

19.6 Lab