chapter six

Chapter 6. Storing the analyzed or collected data

This chapter covers

Understanding why we need to store the data
Learning what to store
Choosing your storage solution

Up to this point we have spent time discussing the architecture and the algorithms commonly used in stream-processing applications. This chapter focuses on what to do with the data after you have processed it. Our focus will be less on the performance of any persistent store we use and more on how to choose a persistent store if one is needed for your application.

First let’s recap where we are in the overall architecture. Figure 6.1 shows our over-arching architecture with the focus of this chapter highlighted.

Figure 6.1. Overall streaming architecture blueprint with the chapter focus identified

We’ll talk about the storage options from a streaming perspective, the key attributes of the most popular products, and things to consider when you use one. To set the stage, let’s begin with the four options we have when our analysis is done and the data is ready to be consumed. We can do any of the following:

Analyze and discard the data
Analyze and push the data back into the streaming platform
Analyze and store the data for real-time usage
Analyze and store the data for batch/offline access

Chapter 6. Storing the analyzed or collected data

This chapter covers

Figure 6.1. Overall streaming architecture blueprint with the chapter focus identified

6.1. When you need long-term storage

6.2. Keeping it in-memory

6.3. Use case exercises

6.4. Summary