concept Secor in category kafka

This is an excerpt from Manning's book Kafka in Action MEAP V14.
Secor is an interesting project that has been around since 2014 from Pinterest that aims to help persist Kafka log data to a variety of storage options including S3 and Google Cloud Storage [1]. The options for output is also various including sequence files, ORC, Parquet files as well as other formats. As always, one major benefit of these projects having source code in a public repository is that fact that you can see how other teams have implemented requirements that might be similar to yours. Figure 8.3 shows how Secor would act as a consumer of your Kafka cluster, very much like any other application. Having a consumer being added to your cluster for data backup is not a big deal, it leverages the way Kafka has always handled multiple readers of the events!
Figure 8.3. Secor acting as a consumer and placing data into S3
![]()
Secor runs as a Java process and can be feed your specific configuration. In effect, it acts as another consumer of your existing topic(s) to gather data to end up in your specific destination like an S3 bucket. Secor does not get in the way of your other consumers and allows you to have a copy of your events so that they are not lost once Kafka retention removes data from its own logs. Listing 8.3 shows an example of how you can start the Secor java application with parameters for your own usage.