chapter ten

10 Healthy microservices

 

This chapter covers:

  • Techniques to ensure your microservices remain healthy
  • Logging and monitoring for microservices
  • Debugging microservices
  • Patterns for reliability and fault tolerance

Errors happen. Code has bugs. Hardware, software and networks can be unreliable.

Failures happen for all types of applications, it’s just microservices. But microservices applications are more complex and so problems can become considerably worse as we grow our application. The more microservices we are maintaining, the greater the chance, at any given time, that some of those microservices will be misbehaving.

We can’t avoid problems entirely. It doesn’t matter if they are caused by human error or unreliable infrastructure, it’s just a certainty - problems happen. But just because problems can’t always be avoided doesn’t mean we shouldn’t try to mitigate against them. A well-engineered application anticipates and accounts for problems, even when the specific nature of some problems can’t be anticipated.

As our application evolves to be more complex we’ll need techniques to combat problems and keep our microservices healthy. Our industry has developed many best practices and patterns for dealing with problems and we’ll cover some of the most important in this chapter. Following this guidance will make your application run more smoothly and be more reliable, resulting in less stress and making it easier to recover from problems when they do happen.

10.1  Maintaining healthy microservices

10.2  Monitoring your microservices

10.2.1    Logging in development

10.2.2    Error handling

10.2.3    Logging with Docker Compose

10.2.4    Basic logging with Kubernetes

10.2.5    Roll your own log aggregation for Kubernetes

10.2.6    Enterprise logging, monitoring and alerts

10.2.7    Automatic restarts with Kubernetes health checks

10.2.8    Tracing across microservices

10.3  Debugging microservices

10.3.1    The debugging process

10.3.2    Debugging production microservices

10.4  Reliability and recovery

10.4.1    Practice defensive programming

10.4.2    Practice defensive testing

10.6  Summary