13 Governance

 

This chapter covers

  • Strategies for ensuring data integrity and compatibility
  • Security measures to protect Kafka environments
  • Controlling resource allocation
  • Preventing cluster overload

Effective governance is crucial for any data-centric architecture, and Apache Kafka is no exception. As organizations use Kafka to manage their real-time data streams, the need for structured oversight becomes increasingly important.

What happens when proper governance is absent? Without structured schema management, data inconsistency can become a serious problem. For example, a change in a data producer’s schema, such as adding a new field, can break downstream consumers that aren’t prepared for it, leading to application failures and costly downtimes.

Similarly, the absence of robust security measures exposes the system to unauthorized access, which could result in data breaches, tampering, or even the loss of sensitive customer information. A lack of resource management can also lead to resource hogging, where a single client overwhelms the Kafka cluster, degrading performance for other users and applications.

Consider our e-commerce platform that uses Kafka to process real-time transactions. Without schema governance, inconsistent product data from different sources could lead to errors in inventory systems or incorrect pricing.

13.1 Schema management

13.1.1 Why do we need schemas?

13.1.2 Compatibility levels

13.1.3 Schema registries

13.1.4 Avro

13.2 Security

13.2.1 Transport encryption

13.2.2 Authentication

13.2.3 Authorization

13.2.4 Encryption at rest

13.2.5 End-to-end encryption

13.2.6 ZooKeeper security

13.2.7 Securing an unsecured Kafka cluster

13.3 Quotas in Kafka: Protecting the cluster from overload

Summary