chapter five

5 Preparing and building the model

This chapter covers

Revisiting the dataset and determining which features to use to train the model
Refactoring the dataset to include timeslots when there is no delay
Transforming the dataset into the format expected by the Keras model
Building a Keras model automatically based on the structure of the data
Examining the structure of the model
Setting parameters, including activation and optimization functions and learning rate

This chapter begins with a quick reexamination of the dataset to consider which columns can legitimately be used to train the model. Then we’ll go over the transformations required to get the data from the format in which we have been manipulating it (Pandas dataframes) to the format expected by the deep learning model. Next, we will go over the code for the model itself and see how the model is built up layer by layer based on the category of the input columns. We wrap up by reviewing methods you can use to examine the structure of the model and the parameters you can use to adjust how the model is trained.

5.1 Data leakage and features that are fair game for training the model

5.2 Domain expertise and minimal scoring tests to prevent data leakage

5.3 Preventing data leakage in the streetcar delay prediction problem

5.4 Code for exploring Keras and building the model

5.5 Deriving the dataframe to use to train the model

5.6 Transforming the dataframe into the format expected by the Keras model

5.7 A brief history of Keras and TensorFlow

5.8 Migrating from TensorFlow 1.x to TensorFlow 2

5.9 TensorFlow vs. PyTorch

5.10 The structure of a deep learning model in Keras

5.11 How the data structure defines the Keras model

5.12 The power of embeddings