Managing the health and ongoing maintenance of the enterprise or platform is something we take for granted. We spend all this effort in the design, build, and deployment, but sometimes we forget what it takes to keep the lights on, head off any potential problems, and create an environment where, should something go off the rails, recovery is available.
In this chapter, we are going to touch on some of the higher-level areas that have caught out many CTOs. They will most likely be obvious procedural items, but sometimes, even the obvious things are overlooked. For example, many of us believe we are fully backing up our platforms, but when was the last time you actually tried to restore a system? How sure are you that the backup has covered everything you need?