7 Handling failure with grace
This chapter covers
- Identifying the main categories of failures
- Avoiding common failure-handling antipatterns
- Logging errors effectively
- Designing failure-handling strategies
Before diving deeper into our rosette of beautiful code, we need to step back and address something more transversal that touches every dimension: failure.
Because let’s face it, the code will inevitably fail at some point. Whether it’s unexpected data, an overlooked bug, or an external system suddenly taking the day off, something will go wrong. And it could get worse, snowballing into a complete system failure. That’s life. But that doesn’t mean you can’t be prepared.
Handling failures gracefully is neither trivial nor something that should be treated as an afterthought. It’s about ensuring system robustness, providing meaningful clues to trace the root cause, while preserving the clarity of the code’s story, and the sanity of your end users. Proper failure management must be an integral part of the design from the outset. In this chapter, we’ll explore how to design robust failure-handling strategies, avoiding common pitfalls, and ensuring your users never have to face mishandled errors again.
7.1 Categorizing failure
Not all failures are created equal. Some are part of the natural flow of programs, others stem from coding errors, infrastructure issues, and some are so critical that they can take the entire system down if not handled carefully.