This chapter covers
- Providing a high-level overall design of our system
- Optimizing the data ingestion component for multiple epochs of the dataset
- Deciding which distributed model training strategy best minimizes overhead
- Adding model server replicas for high-performance model serving
- Accelerating the end-to-end workflow of our machine learning system
In the previous chapters, we learned to choose and apply the correct patterns for building and deploying distributed machine learning systems to gain practical experience managing and automating machine learning tasks. In chapter 2, I introduced a couple of practical patterns that can be incorporated into data ingestion, usually the first process of a distributed machine learning system and responsible for monitoring incoming data and performing necessary preprocessing steps to prepare for model training.