3 Deep Learning System Design

This chapter covers

The high level architecture of a deep learning system
Key service components of a deep learning system
Shipping the system and models to production
Open source deep learning systems

When asked to design a system, even when given a clear set of goals, it could seem like a daunting, open-ended task. In this chapter, we will walk you through a thought process on how to get started with designing a deep learning system.

The first step to a design is finding requirements and constraints from its users. We will cover this in Section 3.1, where we will provide some concrete examples based on our experience. We will also provide a sample architecture, or reference architecture, that can be used as a starting point for your own design.

In Section 3.2, we will explore roles of key service components of the high level architecture from Section 3.1. This will help you understand what each of them do and how they connect with one another to form a deep learning system that users will find helpful to their daily job function. Each component described in this section will be covered in greater details in their respective chapters that follow.

3.1 High level architecture

3.1.1 Gathering goals and requirements

3.1.2 Reference architecture

3.2 Key components

3.2.1 Dataset management

3.2.2 Model training

3.2.3 Model serving

3.2.4 Metadata & artifacts store

3.2.5 Workflow management

3.2.6 Experimentation

3.2.7 Model monitoring

3.2.8 Why do we recommend building components on top of Kubernetes?

3.3 Shipping to production

3.3.1 Updating models

3.3.2 Updating services

3.3.3 Monitoring

3.4 Survey of existing solutions

3.4.1 Amazon SageMaker

3.4.2 Google Vertex AI

3.4.3 Microsoft Azure Machine Learning

3.4.4 Kubeflow

3.5 Summary