8 Kafka storage

This chapters covers

How long to retain data
Data movement into and out of Kafka
Data architectures Kafka enables
Storage for cloud instances and containers

So far we have thought of our data as moving into and out of Kafka for brief periods of time. Another decision to consider is where our data should live long term. When you use databases like MySQL or MongoDB^®, you may not always think about if or how that data expires. Rather, you know that the data is (likely) going to exist for the majority of your application’s entire lifetime. In comparison, Kafka’s storage logically sits somewhere between the long-term storage solutions of a database and the transient storage of a message broker, especially if we think of message brokers holding onto messages until they are consumed by a client, as it often is in other message brokers. Let’s look at a couple of options for storing and moving data in our Kafka environment.

8.1 How long to store data

8.2 Data movement

8.2.1 Keeping the original event

8.2.2 Moving away from a batch mindset

8.3 Tools

8.3.1 Apache Flume

8.3.2 Red Hat^® Debezium™

8.3.3 Secor

8.3.4 Example use case for data storage

8.4 Bringing data back into Kafka

8.4.1 Tiered storage

8.5 Architectures with Kafka

8.5.1 Lambda architecture

8.5.2 Kappa architecture

8.6 Multiple cluster setups

8.6.1 Scaling by adding clusters

8.7 Cloud- and container-based storage options

8.7.1 Kubernetes clusters

Summary

References