17 Building policies for telemetry retention and aggregation

 

This chapter covers

  • Creating retention policies for your telemetry
  • Creating aggregation policies for your metrics
  • Understanding the role sampling plays in telemetry and retention policies

Retention policies (how long to keep telemetry) and aggregation policies (how to summarize telemetry) are some of the most important policies you will set for your telemetry systems. Related to aggregation, the sampling technique uses statistical methods to summarize telemetry and is commonly used in distributed tracing. This chapter is about those policies and the trade-offs you need to consider when it comes time to set your own. For the most part, the trade-off is cost versus features—a familiar balancing act for business.

Your retention policy determines how long your telemetry is useful for people in supporting the decisions they need to make. Many organizations find the need for two retention periods: an online retention period, when everything is searchable, and an offline period (cold storage), when telemetry can be made online if needed but otherwise isn’t searchable without bringing it online again. Retention policies are your most important policy for keeping the cost of your telemetry system reduced to something you’re willing to pay.

17.1 Creating a retention policy

17.1.1 Building a policy for centralized logging

17.1.2 Building a policy for metrics

17.1.3 Building a policy for distributed tracing

17.1.4 Building a policy for SIEM systems

17.2 Creating an aggregation policy

17.3 Using sampling to reduce costs and increase retention

Summary