Chapter 15. Automatic scaling of pods and cluster nodes

This chapter covers

Configuring automatic horizontal scaling of pods based on CPU utilization
Configuring automatic horizontal scaling of pods based on custom metrics
Understanding why vertical scaling of pods isn’t possible yet
Understanding automatic horizontal scaling of cluster nodes

Applications running in pods can be scaled out manually by increasing the replicas field in the ReplicationController, ReplicaSet, Deployment, or other scalable resource. Pods can also be scaled vertically by increasing their container’s resource requests and limits (though this can currently only be done at pod creation time, not while the pod is running). Although manual scaling is okay for times when you can anticipate load spikes in advance or when the load changes gradually over longer periods of time, requiring manual intervention to handle sudden, unpredictable traffic increases isn’t ideal.

Luckily, Kubernetes can monitor your pods and scale them up automatically as soon as it detects an increase in the CPU usage or some other metric. If running on a cloud infrastructure, it can even spin up additional nodes if the existing ones can’t accept any more pods. This chapter will explain how to get Kubernetes to do both pod and node autoscaling.

The autoscaling feature in Kubernetes was completely rewritten between the 1.6 and the 1.7 version, so be aware you may find outdated information on this subject online.

15.1. Horizontal pod autoscaling

15.1.1. Understanding the autoscaling process

15.1.2. Scaling based on CPU utilization

15.1.3. Scaling based on memory consumption

15.1.4. Scaling based on other and custom metrics

15.1.5. Determining which metrics are appropriate for autoscaling

15.1.6. Scaling down to zero replicas

15.2. Vertical pod autoscaling

15.2.1. Automatically configuring resource requests

15.2.2. Modifying resource requests while a pod is running

15.3. Horizontal scaling of cluster nodes

15.3.1. Introducing the Cluster Autoscaler

15.3.2. Enabling the Cluster Autoscaler

15.3.3. Limiting service disruption during cluster scale-down

15.4. Summary