chapter six

6 Scaling up

This chapter covers:

Scaling Pods and Nodes manually
How Kubernetes platforms can be configured to add and remove nodes automatically based on the resources your Pods require
Using request-based metrics to dynamically scale Pod replicas
Using low priority “balloon” pods to provision burst capacity
Architecting apps so that they can be scaled

Now that we have the application deployed and have health checks in place to keep it running without intervention, it’s a good time to look at how you’re going to scale up. I’ve named this chapter “scaling up”, as I think everyone cares deeply about whether their system architecture can handle being scaled up when your application becomes wildly successful and you need to serve all your new users, but don’t worry I’ll also cover scaling down so you can save money during the quiet periods.

Our goal is ultimately to operationalize our deployment using automatic scaling, that way we can be fast asleep, or relaxing on a beach in Australia, and our application can be responding to traffic spikes dynamically. To get there, we’ll need to ensure that the application is capable of scaling, understand the scaling interactions of Pods and Nodes in the Kubernetes cluster, and the right metrics to configure an autoscaler to do it all for us.

6.1 Scaling Pods and Nodes

6.2 Horizontal Pod Autoscaling

6 Scaling up

This chapter covers:

6.1 Scaling Pods and Nodes

6.2 Horizontal Pod Autoscaling

6.2.1 External Metrics

6.3 Node Autoscaling & Capacity Planning

6.3.1 Cluster Autoscaling

6.3.2 Capacity Planning with Cluster Autoscaling

6.4 Building Your App to Scale

6.4.1 Avoiding State

6.4.2 Microservice Architectures

6.5 Summary