8 Application-level fault injection

This chapter covers

Building chaos engineering capabilities directly into your application
Making sure that the extra code doesn’t affect the application’s performance
More advanced usage of ApacheBench

So far, you’ve covered a variety of ways of applying chaos engineering to a selection of different systems. The languages, tools, and approaches varied, but they all had one thing in common - working with source code outside your control. If you’re in a role like Site Reliability Engineer (SRE) or platform engineer, that’s going to be your bread and butter. But sometimes, you will have the luxury of applying chaos engineering to your own code. This chapter focuses on how baking chaos engineering options directly into your application can be a quick, easy and - dare I say it - fun way of increasing your confidence in the overall stability of the system as a whole. I’ll guide you through designing and running two experiments: one injecting latency into functions responsible for communicating with an external cache and another injecting intermittent failure through the simple means of raising an exception. The example code is written in Python, but don’t worry if it’s not your forte: I promise to keep it basic.

Note

Not only for Python!

8.1 Scenario

8.1.1 Implementation details - before chaos

8.2 Experiment 1 - Redis latency

8.2.1 Experiment 1 plan

8.2.2 Experiment 1 steady state

8.2.3 Experiment 1 implementation

8.2.4 Experiment 1 execution

8.2.5 Experiment 1 discussion

8.3 Experiment 2 - failing requests

8.3.1 Experiment 2 plan

8.3.2 Experiment 2 implementation

8.3.3 Experiment 2 execution

8.4 Application versus infrastructure

8.5 Summary