chapter nine

9 A complete implementation

This chapter covers

Implementing data ingestion component with TensorFlow
Defining the machine learning model and submitting distributed model training jobs
Implementing a single-instance model server as well as replicated model servers
Building an efficient end-to-end workflow of our machine learning system

In the previous chapter of the book, we learned the basics of the four core technologies that we will use in our project: TensorFlow, Kubernetes, Kubeflow, and Argo Workflows. We learned that TensorFlow performs data processing, model building, and model evaluation. We also learned the basic concepts of Kubernetes and started our local Kubernetes cluster, which we will use as our core distributed infrastructure. In addition, we successfully submitted distributed model training jobs to the local Kubernetes cluster using Kubeflow. At the end of the last chapter, we learned how to use Argo Workflows to construct and submit a basic “hello world” workflow and a complex DAG-structured workflow.

9.1 Data ingestion

9.1.1 Single-node data pipeline

9.1.2 Distributed data pipeline

9.2 Model training

9.2.1 Model definition and single-node training

9.2.2 Distributed model training

9.2.3 Model selection

9.3 Model serving

9.3.1 Single-server model inference

9.3.2 Replicated model servers

9.4 The end-to-end workflow