8 Running data-heavy apps with StatefulSets and Jobs
"Data-heavy" isn't a very scientific term, but this chapter is about running a class of application which isn’t just stateful, but is very demanding about how it uses state. Databases are one example. They need to run across multiple instances for high availability, each instance needs a local data store for fast access, and those independent data stores need to be kept in sync. The data has its own availability requirements and you'll need to run backups periodically to guard against terminal failure or corruption. There are similar requirements for other data-intensive applications like message queues and distributed caches.
You can run those kinds of app in Kubernetes, but you need to design around an inherent conflict: Kubernetes is a dynamic environment, and data-heavy apps typically expect to run in a stable environment. Clustered applications which expect to find peers at a known network address won't work nicely in a ReplicaSet, and backup jobs which expect to read from a disk drive won't work well with PersistentVolumeClaims. You need to model your app differently if it has strict data requirements, and we'll cover how to do that in this chapter with some more advanced controllers: StatefulSets, Jobs and CronJobs.