4 Agents & Instrumentation

 

This chapter covers

  • What we mean by agents and instrumentation
  • What type of agents exist, for what signal types
  • What OpenTelemetry is and how you can benefit from it
  • Criteria for selecting an agent

In this chapter we will focus on how to get the signals from the sources that we discussed in Chapter 3 to the destinations, which we will be focusing on in Chapter 5 and Chapter 6, respectively.

In a nutshell, in this chapter we will learn how to instrument code (and automate that task), as well as select and deploy agents that collect, aggregate, filter, downsample, redact, and route logs, metrics, and traces.

The telemetry industry is, at time of writing, in the midst of a tectonic shift. This transformation from vendor-specific or signal-specific instrumentation and agents to an industry standard called OpenTelemetry started in 2019. In the context of this book we consider vendor-specific as well as signal-specific agents as traditional agents, in contrast to OpenTelemetry.

Cloud providers, from AWS to Azure to Google Cloud and equally observability vendors from Datadog to Splunk to New Relic to Lightstep to Dynatrace to Honeycomb to Grafana Labs have decided to assemble behind OpenTelemetry. This effectively means that the telemetry industry has decided to make telemetry table stakes, to commodotize it, and rather compete on the destinations (storage & query as well as front-ends).

4.1 Log Routers

4.1.1 Fluentd & Fluent Bit

4.1.2 Other Log Routers

4.2 Metrics Collection

4.2.1 Prometheus

4.2.2 Other Metrics Agents

4.3 OpenTelemetry

4.3.1 Instrumentation

4.3.2 Collector

4.4 Other Agents

4.5 Selecting An Agent

4.5.1 Security For and Of the Agent

4.5.2 Agent Performance and Resource Usage

4.5.3 Agent Non-Functional Requirements (NFRs)

4.6 Summary