13 Achieving high availability: Availability zones, autoscaling, and CloudWatch

 

This chapter covers

  • Recovering a failed virtual machine with a CloudWatch alarm
  • Using autoscaling to guarantee your virtual machines keep running
  • Understanding availability zones in an AWS region
  • Analyzing disaster-recovery requirements

Imagine you run an online shop. During the night, the hardware running your virtual machine fails. Your users can no longer access your web shop until the next morning when you go into work. During the eight-hour downtime, your users search for an alternative and stop buying from you. That’s a disaster for any business. Now imagine a highly available web shop. Just a few minutes after the hardware failed, the system recovers, restarts itself on new hardware, and your e-commerce website is back online again—without any human intervention. Your users can now continue to shop on your site. In this chapter, we’ll teach you how to build a highly available system based on EC2 instances like this one.

Virtual machines are not highly available by default, so the potential for system failure is always present. The following scenarios could cause an outage of your virtual machine:

13.1 Recovering from EC2 instance failure with CloudWatch

13.1.1 How does a CloudWatch alarm recover an EC2 instance?

13.2 Recovering from a data center outage with an Auto Scaling group

13.2.1 Availability zones: Groups of isolated data centers

13.2.2 Recovering a failed virtual machine to another availability zone with the help of autoscaling

13.2.3 Pitfall: Recovering network-attached storage

13.2.4 Pitfall: Network interface recovery

13.2.5 Insights into availability zones

13.3 Architecting for high availability

13.3.1 RTO and RPO comparison for a single EC2 instance

13.3.2 AWS services come with different high availability guarantees

Summary