In this chapter
“Technology makes it possible for people to gain control over everything, except over technology.”
—John Tudor
After learning the basic concepts in streaming systems in the previous chapters, it is time to take a small break and review them in this chapter. We will also take a peek at the content in the later chapters and get ready for the new adventure.
A job is an application that loads incoming data and processes it. All streaming jobs have four different pieces: event, stream, source, and operator. Note that these concepts may or may not be named in a similar fashion in different frameworks.
Processing events one by one is usually not acceptable in the real world. Parallelization is critical for solving problems on a large scale (i.e., it can handle more load). When using parallelization, it is necessary to understand how to route events with a grouping strategy.
A DAG, or directed acyclic graph, is used to represent the logical structure of a streaming job and how data flows through it. In more complicated streaming jobs like the fraud detection system, one component can have multiple upstream components (fan-in) and/or downstream components (fan-out).
DAGs are useful for representing streaming jobs.