Chapter 7. Archiving events

This chapter covers

Why you should be archiving raw events from your unified log
The what, where, and how of archiving
Archiving events from Kafka to Amazon S3
Batch-processing an event archive via Spark and Elastic MapReduce

So far, our focus has been on reacting to our events in stream, as these events flow through our unified log. We have seen some great near-real-time use cases for these event streams, including detecting abandoned shopping carts and monitoring our servers. You would be correct in thinking that the immediacy of a unified log is one of its most powerful features.

But in this chapter, we will take a slight detour and explore another path that our event streams can take: into a long-term store, or archive, of all our events. To continue with the flowing water analogies so beloved of data engineers: if the unified log is our Mississippi River, our event archive is our bayou:^[1] a sleepy but vast backwater, fertile for exploration.

¹A more formal definition of bayou can be found at Wikipedia: https://en.wikipedia.org/wiki/Bayou.

There are many good reasons to archive our events like this; we will make the case for these first in an archivist’s manifesto. With the case made, we will then introduce the key building blocks of a good event archive: which events to archive, where to store them, and what tooling to use to achieve this.

Chapter 7. Archiving events

This chapter covers

7.1. The archivist’s manifesto

7.2. A design for archiving

7.3. Archiving Kafka with Secor

7.4. Batch processing our archive

Summary

Chapter 7. Archiving events

This chapter covers

7.1. The archivist’s manifesto

7.2. A design for archiving

7.3. Archiving Kafka with Secor

7.4. Batch processing our archive

Summary

Unable to load book!