2 Data ingestion for real-time machine learning
This chapter covers
- Ingesting real-time event data
- Using event-driven architectures to stream and persist real-time data
- Advantages of event-driven architectures for real-time machine learning
Every machine learning project starts with a problem that can be solved by using a machine learning model that learns patterns from the data. The data available to you and the data you have the ability to collect defines what is possible for the project. This is no different for real-time machine learning. In order to start building real-time machine learning applications, you need to understand what real-time data is available to you and how to use it to produce useful predictions.
In the previous chapter we explored how real-time data instances are used to both generate real-time inferences and train online models. In order to accomplish this, the data first has to be ingested and transmitted from its origin to the process where inference is happening. This requires dedicated data architectures to handle the continuous transfer of data, known as data streams. In practice, data streams are implemented as message queues or event streams. We discuss message queues and event streams in section 2.2.