9 Wasting a perfectly good incident

 

This chapter covers

  • Conducting blameless postmortems
  • Addressing the mental models people have during an incident
  • Generating action items that further the improvement of the system

When something unexpected or unplanned occurs that creates an adverse effect on the system, I define that action as an incident. Some companies reserve the term for large catastrophic events, but with this broader definition, you get to increase the learning opportunities on your team when an incident occurs.

As mentioned previously, at the center of DevOps is the idea of continuous improvement. Incremental change is a win in a DevOps organization. But the fuel that powers that continuous improvement is continual learning--learning about new technologies, existing technologies, how teams operate, how teams communicate, and how all these things interrelate to form the human-technical systems that are engineering departments.

9.1 The components of a good postmortem

9.1.1 Creating mental models

9.1.2 Following the 24-hour rule

9.1.3 Setting the rules of the postmortem

9.2 The incident

9.3 Running the postmortem

9.3.1 Choosing whom to invite to the postmortem

9.3.2 Running through the timeline

9.3.3 Defining action items and following up

9.3.4 Documenting your postmortem

9.3.5 Sharing the postmortem

Summary