chapter seven

7 Project overview and system architecture

This chapter covers

Providing a high-level overall design of our system
Optimizing the data ingestion component for multiple epochs of the dataset
Deciding which distributed model training strategy best minimizes overhead
Adding model server replicas for high-performance model serving
Accelerating the end-to-end workflow of our machine learning system

In the previous chapters, we learned to choose and apply the correct patterns for building and deploying distributed machine learning systems to gain practical experience managing and automating machine learning tasks. In chapter 2, I introduced a couple of practical patterns that can be incorporated into data ingestion, usually the first process of a distributed machine learning system and responsible for monitoring incoming data and performing necessary preprocessing steps to prepare for model training.

7.1 Project overview

7.1.1 Project background

7.1.2 System components

7.2 Data ingestion

7.2.1 The problem

7.2.2 The solution

7.2.3 Exercises

7.3 Model training

7.3.1 The problem

7.3.2 The solution

7.3.3 Exercises

7.4 Model serving

7.4.1 The problem

7.4.2 The solution

7.4.3 Exercises