6 Streaming systems review and a glimpse ahead

 

In this chapter

  • a review of the concepts we’ve learned
  • an introduction of more advanced concepts to be covered in the chapters in part 2

Technology makes it possible for people to gain control over everything, except over technology.

—John Tudor

After learning the basic concepts in streaming systems in the previous chapters, it is time to take a small break and review them in this chapter. We will also take a peek at the content in the later chapters and get ready for the new adventure.

Streaming system pieces

A job is an application that loads incoming data and processes it. All streaming jobs have four different pieces: event, stream, source, and operator. Note that these concepts may or may not be named in a similar fashion in different frameworks.

Parallelization and event grouping

Processing events one by one is usually not acceptable in the real world. Parallelization is critical for solving problems on a large scale (i.e., it can handle more load). When using parallelization, it is necessary to understand how to route events with a grouping strategy.

DAGs and streaming jobs

A DAG, or directed acyclic graph, is used to represent the logical structure of a streaming job and how data flows through it. In more complicated streaming jobs like the fraud detection system, one component can have multiple upstream components (fan-in) and/or downstream components (fan-out).

DAGs are useful for representing streaming jobs.

Delivery semantics (guarantees)

Delivery semantics used in the credit card fraud detection system

sitemap