6 Streaming Data: Bringing Everything Together

This chapter covers

Learning about the streaming data pipeline model and its distributed framework
Determining where streaming data applications and the data stream model meet
Identifying where algorithms and data structures fit in data streams
Recognizing the current duality of the data processing paradigm
Setting up basic computing constraints and concepts inherent to data streams

Previous chapters introduced a number of algorithms / data structures for sketching (an important characteristic of) huge amounts of data residing in a database or, as you saw in the application of HLL in network traffic surveillance, arriving and expiring at a lightning rate. Within the pages to follow, we will round up these at the corner where they usually meet anyway.

6.1 Streaming Data System – a meta-example

6.1.1 Bloom-join

6.1.2 De-duplication

6.1.3 Load balancing and tracking the network traffic

6.2 The Future is coming: in discrete batches or as a continuous stream ?

6.3 Practical constraints and concepts in data streams

6.3.1 Time

6.3.2 Small time and small space

6.3.3 Concept shifts and concept drifts

6.3.4 Sliding window model

6.4 Summary