chapter six
6 Real-time data processing and analytics
This chapter covers:
- A definition of real-time processing and real-time analytics and some associated sample use cases
- How best to organize data in “fast” storage
- Understanding typical real-time data transformation scenarios
- Organizing data for real-time use
- Understanding common data transformations and translate them into real-time processing
- Comparing real-time processing services available from Amazon Web Services (AWS), Microsoft Azure and Google Cloud (GC)
In this chapter, we’ll help you get a clear understanding of real-time or streaming data - one of the most popular features of a modern data platform.
We’ll cover the difference between real-time ingestion and real-time processing and walk through some examples of when to use one or both, showing different data platform designs.
We’ll also go deeper into how streaming data is organized - with producers, consumers, messages, partitions and offsets. Then we’ll walk through some typical real-time data transformation use cases, wth particular attention on dealing with data deduplication, file format conversion, real-time data quality checks, and combining batch and real-time data.