Chapter 16. High availability: auto scaling
Although load balancers are good at gracefully accommodating the sudden loss of a server, one thing they can’t do is replace the lost capacity that the now-dead server originally provided. In other words, if one of your three servers has crashed, the two left behind will now have to manage the full workload on their own. Helping out in that area is well beyond your load balancer’s pay scale.
And then there’s that elasticity thing: load balancers can keep what you’ve got running nicely, but they’re not built to manage change. If you’re worried that unexpected server downtime or increased demand can leave your application unable to properly do its job, you’ll need to find a way to add capacity. But you’ll have to look beyond load balancers, to auto scaling.
It’s been a rough week for your web application. Load-balanced EC2 instances have been periodically crashing for no apparent reason (note to self: have a frank discussion with the lead developer ASAP), and customers have been complaining about slow service from your site during high-demand times. Something’s got to change. And here’s where it’s going to happen.
In this chapter, you’ll learn how to use auto scaling to do two things: automate the replacement of instances when they fail, and increase or decrease the number of instances you’re running to keep up with changing customer demand.
Predictably, the two main principles of auto scaling are