5 Autoscaling

This chapter covers

Problems that autoscalers set out to solve
How Knative Serving’s autoscaling works under various scenarios
A walkthrough of the core autoscaling algorithm
Configuration options and how these affect autoscaling

Autoscaling awakens the engineering imagination in a way that few topics do. Most of the systems we build seem lifeless or mindless. But to build a system that appears to breathe is somehow uniquely fascinating. Depressingly, though, autoscaling turns out to be easy to spell, yet hard to achieve. The system that breathes peacefully today is yelling obscenities tomorrow.

My goal in this chapter is to explain the basic structure and functioning of the components responsible for the management of scaling in Knative Serving: the Autoscaler, the Activator, and the Queue-Proxy. Most of the time, you will not need to think of these because these embody the accumulated observations and insights of the Knative authors. But, these are dynamic systems and exhibit dynamic complexity, which means that you will occasionally be surprised. A grasp of the components will help you to moderate your surprise.

5.1 The autoscaling problem

5.2 Autoscaling when there are zero instances

5.2.1 The Autoscaler panics

5.3 Autoscaling when there are one or a few instances

5.4 Autoscaling when there are many instances

5.5 A little theory

5.5.1 Control

5.5.2 Queueing

5.6 The actual calculation

5.6.1 To panic, or not to panic, that is the question

5.7 Configuring autoscaling

5.7.1 How settings get applied

5.7.2 Setting scaling limits

5.7.3 Setting scaling rates

5.7.4 Setting target values

5.7.5 Setting decision intervals