chapter one

1 Introduction to real-time machine learning

This chapter covers

What is real-time data?
Offline learning vs. online learning
Common use cases for online learning

Real-time machine learning (sometimes referred to as online learning) is an approach which uses real-time data to build predictive systems that adapt to changes in an environment. This is different from batch wise machine learning (or offline learning) in which historical data sets are carefully curated for training and evaluation. The fundamental assumption in offline learning is that there is some ground truth in the input features that remains stable while models are in production. In reality, the statistical properties of the data, such as probability distributions or relationships between features are likely to change over time. This shift, known as data drift, can reduce a machine learning model’s accuracy because it was originally trained on data with different statistical characteristics. Offline models must be retrained routinely to avoid this degradation in accuracy. However, retraining these models is often both expensive and time consuming since they require iterating over large datasets many times. By the time these models are deployed to production they may be operating on data assumptions that are no longer true. In other words, offline models cannot adapt to the data changes that occur in real-world environments.

1.1 What is real-time data?

1.2 Offline learning

1.3 Online learning

1.3.1 The model drift problem

1.3.2 The online learning cycle

1.3.3 Offline vs. online learning

1.4 Use cases for real-time machine learning

1.4.1 Recommender Systems

1.4.2 Anomaly Detection

1.4.3 Reinforcement Learning

1.5 Summary