chapter six
6 Real-time data processing and analytics
This chapter covers:
- A definition of real-time processing and real-time analytics and some associated sample use cases
- How best to organize data in “fast” storage
- Typical real-time data transformation scenarios
- A comparison of real-time processing services available from Amazon Web Services (AWS), Microsoft Azure and Google Cloud (GC)
By the end of this chapter you’ll be able to:
- Recognize valid use cases for real-time processing and difference between different real time scenarios
- Organize data for real-time processing
- Translate common data transformations into real-time processing
- Differentiate between the various real-time service offerings available from the three major cloud vendors
In this chapter, we’ll help you get a clear understanding of real-time or streaming data - one of the most popular features of a modern data platform.
We’ll cover the difference between real-time ingestion and real-time processing and walk through some examples of when to use one or both, showing different data platform designs.
We’ll also go deeper into how streaming data is organized - with producers, consumers, messages, partitions and offsets. Then we’ll walk through some typical real-time data transformation use cases, wth particular attention on dealing with data deduplication, file format conversion, real-time data quality checks, and combining batch and real-time data.