This chapter covers
- Understanding the common types of data formats and storage for training datasets
- Using TensorFlow TFRecord format and tf.data for dataset representations and transformations
- Constructing a data pipeline for feeding a model during training
- Preprocessing using TF.Keras preprocessing layers, layer subclassing, and TFX components
- Using data augmentation to train models for translational, scale, and viewport invariance
You’ve built your model, using composable models as needed. You’ve trained and retrained it, and tested and retested. Now you’re ready to launch it. In these last two chapters, you’ll learn how to launch a model. More specifically, you’ll migrate a model from the preparation and exploratory phases to a production environment, using the TensorFlow 2.x ecosystem in conjunction with TensorFlow Extended (TFX).
In a production environment, operations such as training and deploying are executed as pipelines. Pipelines have the advantage of being configurable, reusable, version-controlled, and retain history. Because of how extensive a production pipeline is, we need two chapters to cover it. This chapter focuses on the data pipeline components, which make up the frontend of a production pipeline. The next chapter covers the training and deployment components.