11 Fixing failures

 

This chapter covers

  • Determining how to roll forward failed changes to restore functionality
  • Organizing an approach for IaC troubleshooting
  • Categorizing repairs for failed changes

We took many chapters to discuss writing and collaborating on infrastructure as code. All of the practices and principles you learned for IaC accumulate to the crucial moment when you push a change, it causes your system to fail, and you need to roll it back! However, IaC doesn’t support rollback. You do not fully revert IaC changes. What does it mean to fix failures if you don’t roll them back?

This chapter focuses on fixing failed changes from IaC. First, we’ll discuss what it means to “revert” IaC changes by rolling forward. Then, you’ll learn workflows for troubleshooting and fixing the failed change. While the techniques in this chapter might not apply to every scenario you’ll encounter in your system, they establish a broad set of practices you can use to start repairing IaC failures.

11.1 Restoring functionality

11.1.1 Rolling forward to revert changes

11.1.2 Rolling forward for new changes

11.2 Troubleshooting

11.2.1 Check for drift

11.2.2 Check for dependencies

11.2.3 Check for differences in environments

11.3 Fixing

11.3.1 Reconcile drift

11.3.2 Reconcile differences in environments

11.3.3 Implement the original change