10. Ingestion through structured streaming

 

This chapter covers

  • Understanding streaming
  • Building your first streaming ingestions
  • Capturing the various sources of data in streaming
  • Building an application that takes two streams
  • Differentiating discretized streaming and structured streaming

Look at your data from a few thousand meters (or feet, if you are stuck with the imperial system) and focus on the data-generation part. Do you see systems that generate batches of data, or do you see systems that generate data continuously? Systems delivering a flow of data, also known as streams , were less popular a few years ago. Streams are definitely getting more traction, and understanding streams is the focus of this chapter.

Your mobile phone regularly pings cell towers, for example. If it’s a smartphone (highly probable, based on the audience of this book), it will also check email and more.

The bus travelling through (smart) cities sends its GPS coordinates.

The cash register at your supermarket’s checkout counter generates data as the cashier (or you) pass the items in front of the scanner. A transaction is processed as you pay.

As you bring your car to the garage, a flow of information is collected, stored, and sent to various other recipients such as the manufacturers, insurance companies, or reporting companies.

10.1 What’s streaming?

10.2 Creating your first stream

10.2.1 Generating a file stream

10.2.2 Consuming the records

10.2.3 Getting records, not lines

10.3 Ingesting data from network streams

10.4 Dealing with multiple streams

10.5 Differentiating discretized and structured streaming

Summary