7 Project overview and system architecture

 

This chapter covers

  • Providing a high-level overall design of our system
  • Optimizing the data ingestion component for multiple epochs of the dataset
  • Deciding which distributed model training strategy best minimizes overhead
  • Adding model server replicas for high-performance model serving
  • Accelerating the end-to-end workflow of our machine learning system

In the previous chapters, we learned to choose and apply the correct patterns for building and deploying distributed machine learning systems to gain practical experience managing and automating machine learning tasks. In chapter 2, I introduced a couple of practical patterns that can be incorporated into data ingestion, usually the first process of a distributed machine learning system and responsible for monitoring incoming data and performing necessary preprocessing steps to prepare for model training.

7.1 Project overview

7.1.1 Project background

7.1.2 System components

7.2 Data ingestion

7.2.1 The problem

7.2.2 The solution

7.2.3 Exercises

7.3 Model training

7.3.1 The problem

7.3.2 The solution

7.3.3 Exercises

7.4 Model serving

7.4.1 The problem

7.4.2 The solution

7.4.3 Exercises