Kubernetes models your application with abstractions over the compute and networking layers. The abstractions allow Kubernetes to control network traffic and container lifetimes, so it can take corrective action if parts of your app fail. If you have enough detail in your specifications, the cluster can find and fix temporary problems and keep applications online. These are self-healing applications, which ride out any transient issues without needing a human to guide them. In this chapter, you’ll learn how to model that in your own apps, using container probes to test for health and imposing resource limits so apps don’t soak up too much compute.
There are limits to Kubernetes’s healing powers, and you’ll learn those in this chapter, too. We’re mainly going to look at how you keep your apps running without manual administration, but we’ll also look again at application updates. Updates are the most likely cause of downtime, and we’ll look at some additional features of Helm that can keep your apps healthy during update cycles.