This chapter covers
When Kafka is used as a managed service in the cloud, many operational concerns are abstracted away. Tasks like upgrades, monitoring, and broker management are handled by the provider, allowing teams to focus on building applications. But running Kafka on-premises is a different story. It demands deep operational expertise, ranging from performing safe software upgrades to carefully tuning configurations and continuously monitoring cluster health.
In this chapter, we’ll explore what it takes to maintain a robust, self-managed Kafka cluster. You’ll learn how to
- Safely perform hardware and software updates.
- Add brokers to or remove them from the cluster.
- Modify configurations at the broker, topic, and partition level.
Understanding these maintenance tasks is essential for keeping your Kafka deployment resilient, efficient, and ready to scale.
This chapter is not a complete operational guide. Operational practices and tooling evolve rapidly and are often tightly coupled with specific environments and monitoring stacks. Instead, we’ll focus on foundational principles and best practices that apply across most Kafka deployments.