5 Autoscaling

This chapter covers:

The problems that autoscalers set out to solve.
Descriptions of how Knative Serving’s autoscaling works when there are zero instances, when there are a few instances, and when there are many instances.
A walkthrough of the core autoscaling algorithm.
Descriptions of configuration options and how they affect autoscaling.

Autoscaling awakens the engineering imagination in a way that few topics do. Most of the systems we build seem lifeless or mindless. But to build a system that appears to breathe is somehow uniquely fascinating. Depressingly, though, "autoscaling" turns out to be easy to spell and hard to achieve. The system which today breathes peacefully is tomorrow yelling obscenities.

My goal in this chapter is to explain the basic structure and functioning of the components responsible for management of scaling in Knative Serving: the Autoscaler, the Activator and the Queue-Proxy. Most of the time you will not need to think of these, as they embody the accumulated observations and insights of the Knative authors. But they are dynamic systems and exhibit dynamic complexity, which means that you will occasionally be surprised. A grasp of the components will help you to moderate your surprise.

5.1 The Autoscaling Problem

5.2 Autoscaling when there are zero instances

5.2.1 Autoscaler Panics

5.3 Autoscaling when there are one or a few instances

5.4 Autoscaling when there are many instances

5.5 A little theory

5.5.1 Control

5.5.2 Queueing

5.6 The actual calculation

5.6.1 To panic, or not to panic, that is the question

5.7 Configuring Autoscaling

5.7.1 How settings get applied

5.7.2 Setting scaling limits

5.7.3 Setting scaling rates

5.7.4 Setting target values

5.7.5 Setting decision intervals

5.7.6 Setting window size

5.7.7 Setting the panic threshold

5.7.8 Setting the target burst capacity