8 Join operations

 

In this chapter

  • correlating different types of events in real time
  • when to use inner and outer joins
  • applying windowed joins

An SQL query goes into a bar, walks up to two tables, and asks, can I join you?

—Anonymous

If you have ever used any SQL (structured query language) database, most likely you have used, or at least learned about, the join clause. In the streaming world, the join operation may not be as essential as it is in the database world, but it is still a very useful concept. In this chapter, we are going to learn how join works in a streaming context. We will use the join clause in databases to introduce the calculation and then talk about the details in streaming systems. If you are familiar with the clause, please feel free to skip the introduction pages.

Joining emission data on the fly

Well what do you know? The chief got lucky and fell into an opportunity of tracking the emissions of cars in Silicon Valley, California. Nice, right?

Well, with every great opportunity comes challenges. The team is going to need to find a way to join events from vehicles in specific city locations along with the vehicles’ estimated emission rates on the fly. How will they do it? Let’s check it out.

The emissions job version 1

The emission resolver

Accuracy becomes an issue

The enhanced emissions job

Focusing on the join

What is a join again?

How the stream join works

Stream join is a different kind of fan-in

Vehicle events vs. temperature events

Table: A materialized view of streaming

Vehicle events are less efficient to be materialized

Data integrity quickly became an issue

What’s the problem with this join operator?