8 Kafka storage

This chapter covers:

Moving data and retention in Kafka
Tools to help with data movement
Data architectures Kafka enables

So far we have treated our data as moving into and out of Kafka for brief periods of time. Another decision to really analyze is where your data can live for longer terms. When we think of databases like MySQL or MongoDB, we do not always think if or how that data is expiring-we know it is for majority of your application’s entire lifetime. In comparison, Kafka storage can logically sit in between the practical storage solutions of a database and a message broker in regards to data storage; especially if we think of message brokers holding onto messages until they are consumed by a client as it is often in other message brokers.

As in all of engineering, tradeoffs are involved in each decision, including your business needs and requirements. Let’s look at a couple of options for storing and moving data in your environment.

8.1 How Long to Store Data

8.2 Data Pipelines

8.2.1 Keeping the original event

8.2.2 Moving away from a batch mindset

8.3 Tools

8.3.1 Apache Flume

8.3.2 Debezium

8.3.3 Secor

8.4 Bringing data back into Kafka

8.5 Architectures with Kafka

8.5.1 Lambda Architecture

8.5.2 Kappa Architecture

8.6 Multiple Cluster setups

8.6.1 Scaling by adding Clusters

8.6.2 Active-Active

8.6.3 Active-Passive

8.7 Cloud and Container Based Storage Options

8.7.1 Amazon Elastic Block Store

8.7.2 Kubernetes Clusters

8.8 Summary