9 A complete implementation

 

This chapter covers

  • Implementing data ingestion component with TensorFlow
  • Defining the machine learning model and submitting distributed model training jobs
  • Implementing a single-instance model server as well as replicated model servers
  • Building an efficient end-to-end workflow of our machine learning system

In the previous chapter of the book, we learned the basics of the four core technologies that we will use in our project: TensorFlow, Kubernetes, Kubeflow, and Argo Workflows. We learned that TensorFlow performs data processing, model building, and model evaluation. We also learned the basic concepts of Kubernetes and started our local Kubernetes cluster, which we will use as our core distributed infrastructure. In addition, we successfully submitted distributed model training jobs to the local Kubernetes cluster using Kubeflow. At the end of the last chapter, we learned how to use Argo Workflows to construct and submit a basic “hello world” workflow and a complex DAG-structured workflow.

9.1 Data ingestion

9.1.1 Single-node data pipeline

9.1.2 Distributed data pipeline

9.2 Model training

9.2.1 Model definition and single-node training

9.2.2 Distributed model training

9.2.3 Model selection

9.3 Model serving

9.3.1 Single-server model inference

9.3.2 Replicated model servers

9.4 The end-to-end workflow