Chapter 11. Achieving high availability: availability zones, auto-scaling, and CloudWatch
This chapter covers
- Using a CloudWatch alarm to recover a failed virtual server
- Understanding availability zones in an AWS region
- Using auto-scaling to guarantee running virtual servers
- Analyzing disaster-recovery requirements
In this chapter, we’ll teach you how to build a high-availability architecture based on EC2 instances. A virtual server isn’t highly available by default. The following scenarios cause an outage of your virtual server:
- The virtual server fails because of a software issue (the OS of the virtual server).
- A software issue occurs on the host server, causing the virtual server to crash (the OS of the host server or virtualization layer).
- The computing, storage, or networking hardware of the physical host fails.
- Necessary parts of the data center that the virtual server depends on fail: network connectivity, the power supply, or the cooling system.
For example, if the computing hardware of a physical host server fails, all EC2 instances running on this host server will fail. If you’re running an application on an affected virtual server, this application will fail and cause downtime until somebody—probably you—intervenes by starting a new virtual server running on another physical host server. To avoid this, you should aim for a highly available virtual server that can recover from failure automatically without human intervention.