Chapter 7. Archiving events

 

This chapter covers

  • Why you should be archiving raw events from your unified log
  • The what, where, and how of archiving
  • Archiving events from Kafka to Amazon S3
  • Batch-processing an event archive via Spark and Elastic MapReduce

So far, our focus has been on reacting to our events in stream, as these events flow through our unified log. We have seen some great near-real-time use cases for these event streams, including detecting abandoned shopping carts and monitoring our servers. You would be correct in thinking that the immediacy of a unified log is one of its most powerful features.

But in this chapter, we will take a slight detour and explore another path that our event streams can take: into a long-term store, or archive, of all our events. To continue with the flowing water analogies so beloved of data engineers: if the unified log is our Mississippi River, our event archive is our bayou:[1] a sleepy but vast backwater, fertile for exploration.

1A more formal definition of bayou can be found at Wikipedia: https://en.wikipedia.org/wiki/Bayou.

There are many good reasons to archive our events like this; we will make the case for these first in an archivist’s manifesto. With the case made, we will then introduce the key building blocks of a good event archive: which events to archive, where to store them, and what tooling to use to achieve this.

7.1. The archivist’s manifesto

 
 
 

7.2. A design for archiving

 
 
 

7.3. Archiving Kafka with Secor

 
 
 

7.4. Batch processing our archive

 
 

Summary

 
 
 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest