10 Service level objectives

 

This chapter covers

  • What we mean by service level objectives
  • How to implement service level objectives
  • The tooling available in the SLO space
  • Considerations for implementing SLOs

At this point in the book, you should have a good idea about what ROI-driven observability means. There is, however, an operations topic we haven’t discussed yet and that is needed to complete the picture: how satisfied is the consumer of a service, and how do we know whether the consumer is satisfied, based on data? The consumer doesn’t have to be an external customer, especially in larger organizations, where consumers could be different business units. Now, don’t get me wrong—there’s nothing more motivating than a snarky tweet or a thoughtful comment on the orange site (aka Hacker News). However, wouldn’t it be nice if we could automate the whole process?

If you step back, you will find that DevOps and site reliability engineering (SRE) took off in the past decade, with the former being more bottom up and the latter clearly being driven by Google. The core concepts and ideas in this chapter are, indeed, a Google invention, and if you want to study every last detail, including best practices, I encourage you to head over to the Google SRE books site (https://sre.google/books/) and read everything. In this chapter, we will take a more practical approach, covering the fundamentals quickly and then showing how to use them.

10.1 The fundamentals of SLOs

10.1.1 Types of services

10.1.2 Service level indicator

10.1.3 Service level objective

10.1.4 Service level agreement

10.2 Implementing SLOs

10.2.1 High-level example

10.2.2 Using Prometheus to implement SLOs

10.2.3 Commercial SLO offerings

10.3 Considerations

Summary