1 Into the world of chaos engineering


This chapter covers

  • What chaos engineering is and is not
  • Motivations for doing chaos engineering
  • Anatomy of a chaos experiment
  • A simple example of chaos engineering in practice

What would you do to make absolutely sure the car you’re designing is safe? A typical vehicle today is a real wonder of engineering. A plethora of subsystems, operating everything from rain-detecting wipers to life-saving airbags, all come together to not only go from A to B, but to protect passengers during an accident. Isn’t it moving when your loyal car gives up the ghost to save yours through the strategic use of crumple zones, from which it will never recover?

Because passenger safety is the highest priority, all these parts go through rigorous testing. But even assuming they all work as designed, does that guarantee you’ll survive in a real-world accident? If your business card reads, “New Car Assessment Program,” you demonstrably don’t think so. Presumably, that’s why every new car making it to the market goes through crash tests.

1.1 What is chaos engineering?

1.2 Motivations for chaos engineering

1.2.1 Estimating risk and cost, and setting SLIs, SLOs, and SLAs

1.2.2 Testing a system as a whole

1.2.3 Finding emergent properties

1.3 Four steps to chaos engineering

1.3.1 Ensure observability

1.3.2 Define a steady state

1.3.3 Form a hypothesis

1.3.4 Run the experiment and prove (or refute) your hypothesis

1.4 What chaos engineering is not

1.5 A taste of chaos engineering

1.5.1 FizzBuzz as a service

1.5.2 A long, dark night

1.5.3 Postmortem

1.5.4 Chaos engineering in a nutshell