6 Scaling up

This chapter covers

Scaling Pods and nodes manually
Using CPU utilization and other metrics to scale Pod replicas dynamically
Utilizing managed platforms to add and remove nodes based on the resources your Pods require
Using low-priority placeholder Pods to provision burst capacity
Architecting apps so that they can be scaled

Now that we have the application deployed and have health checks in place to keep it running without intervention, it’s a good time to look at how you’re going to scale up. I’ve named this chapter “Scaling up,” as I think everyone cares deeply about whether their system architecture can handle being scaled up when your application becomes wildly successful and you need to serve all your new users. But, don’t worry, I’ll also cover scaling down so you can save money during the quiet periods.

The goal is, ultimately, to operationalize our deployment using automatic scaling. That way, we can be fast asleep or relaxing on a beach in Australia, and our application can be responding to traffic spikes dynamically. To get there, we’ll need to ensure that the application is capable of scaling, understand the scaling interactions of Pods and nodes in the Kubernetes cluster, and determine the right metrics to configure an autoscaler to do it all for us.

6.1 Scaling Pods and nodes

6.2 Horizontal Pod autoscaling

6.2.1 External metrics

6.3 Node autoscaling and capacity planning

6.3.1 Cluster autoscaling

6.3.2 Spare capacity with cluster autoscaling

6.4 Building your app to scale

6.4.1 Avoiding state

6.4.2 Microservice architectures

6.4.3 Background tasks

Summary