chapter two

2 Data ingestion for real-time machine learning

This chapter covers

Ingesting real-time event data
Using event-driven architectures to stream and persist real-time data
Advantages of event-driven architectures for real-time machine learning

Every machine learning project starts with a problem that can be solved by using a machine learning model that learns patterns from the data. The data available to you and the data you have the ability to collect defines what is possible for the project. This is no different for real-time machine learning. In order to start building real-time machine learning applications, you need to understand what real-time data is available to you and how to use it to produce useful predictions.

In the previous chapter we explored how real-time data instances are used to both generate real-time inferences and train online models. In order to accomplish this, the data first has to be ingested and transmitted from its origin to the process where inference is happening. This requires dedicated data architectures to handle the continuous transfer of data, known as data streams. In practice, data streams are implemented as message queues or event streams. We discuss message queues and event streams in section 2.2.

2.1 Ingesting real-time events

2.1.1 Selecting a data source

2.1.2 Creating real-time events

2 Data ingestion for real-time machine learning

This chapter covers

2.1 Ingesting real-time events

2.1.1 Selecting a data source

2.1.2 Creating real-time events

2.2 Processing real-time data

2.2.1 Message queues

2.2.2 Problems with message queues

2.2.3 Event streams

2.3 Benefits of event-driven architecture for machine learning

2.3.1 Synchronous machine learning

2.3.2 Event-driven machine learning

2.4 Summary