“Data heavy” isn’t a very scientific term, but this chapter is about running a class of application that isn’t just stateful but is also demanding about how it uses state. Databases are one example of this class. They need to run across multiple instances for high availability, each instance needs a local data store for fast access, and those independent data stores need to be kept in sync. The data has its own availability requirements, and you’ll need to run backups periodically to guard against terminal failure or corruption. Other data-intensive applications, like message queues and distributed caches, have similar requirements.
You can run those kinds of app in Kubernetes, but you need to design around an inherent conflict: Kubernetes is a dynamic environment, and data-heavy apps typically expect to run in a stable environment. Clustered applications, which expect to find peers at a known network address, won’t work nicely in a ReplicaSet, and backup jobs, which expect to read from a disk drive, won’t work well with PersistentVolumeClaims. You need to model your app differently if it has strict data requirements, and we’ll cover how to do that in this chapter with some more advanced controllers: StatefulSets, Jobs, and CronJobs.