11 Building a monitoring system
This chapter covers
- Understanding what signals to gather from running applications
- Building a monitoring system to collect metrics
- Learning how to use the collected signals to set up alerts
- Observing the behavior of individual services and their interactions as a system
You’ve now set up an infrastructure to run your services and have deployed multiple components that you can combine to provide functionality to your users. In this chapter and the next, we’ll consider how you can make sure you’ll always be able to know how those components are interacting and how the infrastructure is behaving. It’s fundamental to know as early as possible when something isn’t behaving as expected. In this chapter, we’ll focus on building a monitoring system so you can collect relevant metrics, observe the system behavior, and set up relevant alerts to allow you to keep your systems running smoothly by taking actions preemptively. When you can’t be preemptive, you’ll at least be able to quickly pinpoint the areas that need your attention so you can address any issues. It’s also worth mentioning that you should instrument as much as possible. The collected data you may not use today may turn out to be useful someday.