Retention policies (how long to keep telemetry) and aggregation policies (how to summarize telemetry) are some of the most important policies you will set for your telemetry systems. Related to aggregation, the sampling technique uses statistical methods to summarize telemetry and is commonly used in distributed tracing. This chapter is about those policies and the trade-offs you need to consider when it comes time to set your own. For the most part, the trade-off is cost versus features—a familiar balancing act for business.
Your retention policy determines how long your telemetry is useful for people in supporting the decisions they need to make. Many organizations find the need for two retention periods: an online retention period, when everything is searchable, and an offline period (cold storage), when telemetry can be made online if needed but otherwise isn’t searchable without bringing it online again. Retention policies are your most important policy for keeping the cost of your telemetry system reduced to something you’re willing to pay.