6 Streaming Data: Bringing Everything Together

 

This chapter covers

  • Learning about the streaming data pipeline model and its distributed framework
  • Determining where streaming data applications and the data stream model meet
  • Identifying where algorithms and data structures fit in data streams
  • Recognizing the current duality of the data processing paradigm
  • Setting up basic computing constraints and concepts inherent to data streams

Previous chapters introduced a number of algorithms / data structures for sketching (an important characteristic of) huge amounts of data residing in a database or, as you saw in the application of HLL in network traffic surveillance, arriving and expiring at a lightning rate. Within the pages to follow, we will round up these at the corner where they usually meet anyway.

6.1    Streaming Data System – a meta-example

6.1.1   Bloom-join

6.1.2   De-duplication

6.1.3   Load balancing and tracking the network traffic

6.2    The Future is coming: in discrete batches or as a continuous stream ?

6.3    Practical constraints and concepts in data streams

6.3.1   Time

6.3.2   Small time and small space

6.3.3   Concept shifts and concept drifts

6.3.4   Sliding window model

6.4    Summary