This chapter covers
- Recovering a failed virtual machine with a CloudWatch alarm
- Using auto-scaling to guarantee your virtual machines keep running
- Understanding availability zones in an AWS region
- Analyzing disaster-recovery requirements
Imagine you run an online shop. During the night, the hardware running your virtual machine fails. Until the next morning when you go into work, your users can no longer access your web shop. During the 8-hour downtime, your users search for an alternative and stop buying from you. That’s a disaster for any business. Now imagine a highly available web shop. Just a few minutes after the hardware failed, the system recovers, restarts itself on new hardware, and your e-commerce website is back online again—without any human intervention. Your users can now continue to shop on your site. In this chapter, we’ll teach you how to build a high-availability system based on EC2 instances like that.
Virtual machines are not highly available by default, the potential for system failure is always present. The following scenarios could cause an outage of your virtual machine: