7 Windowed computations

 

In this chapter

  • standard windowing strategies
  • time stamps in events
  • windowing watermark and late events

The attention span of a computer is only as long as its power cord.

—Unknown

In the previous chapters, we built a streaming job to detect fraudulent credit card transactions. There could be many analyzers that use different models, but the basic idea is to compare the transaction with the previous activities on the same card. Windowing is designed for this type of work, and we are going to learn the windowing support in streaming systems in this chapter.

Slicing up real-time data

As the popularity of the team’s new product has grown so has the attention of new types of hackers. A group of hackers has started a new scheme involving gas stations.

Here’s how it works: They capture an innocent victim’s card information and duplicate it from multiple new physical credit cards. From there, the attackers will send the newly created fraudulent cards out to others in the group and orchestrate spending money on the same credit card from multiple locations across the world at the same time to purchase gas. They hope that by charging the card all at once, the card holder will not notice the charges until it’s too late. The result is free gas. Why do they go to a global scale to try and get free tanks of gas? We can consider this a mystery.

How do we prevent this scam?

Breaking down the problem in detail

Breaking down the problem in detail (continued)

Two different contexts

Windowing in the fraud detection job

What exactly are windows?

Looking closer into the window

New concept: Windowing strategy

Fixed windows

Fixed windows in the windowed proximity analyzer

Detecting fraud with a fixed time window

Fixed windows: Time vs. count

Sliding windows