16 Redacting and reprocessing telemetry

 

This chapter covers

  • Identifying toxic data and where it comes from
  • Cleaning up after toxic data spills
  • Reducing the scope of toxic data spills
  • Reprocessing cold storage to improve restorability

There are two big reasons why you might want to rewrite stored telemetry:

  • Regulated information—such as privacy- and health-related information, and sometimes financial information—somehow got into your telemetry systems and needs to be removed before your organization has to notify customers and users of the breach (redaction). I call information like this toxic data because information of these kinds require special handling, and there are severe penalties for getting it wrong.
  • Upgrading a telemetry storage system often means that backups or databases need to be reformatted to ensure restorability, or replacing one telemetry system with another means having to import your old telemetry into the new system (reprocessing).

This chapter is about handling both of these concerns, which certainly can happen at the same time! When upgrading/replacing your storage, you have a great opportunity to redact things you don’t want in your telemetry systems. Although most of what I talk about in this chapter focuses on toxic-information cleanup—it is the more complicated problem—reprocessing matters as much for long-term maintenance of telemetry systems.

16.1 Identifying toxic data and where it comes from

16.2 Redacting toxic information spills

16.3 Reprocessing telemetry to support upgrades

16.4 Isolating toxic data to reduce cleanup costs

Summary