chapter nine

9 Deep learning best practices

This chapter covers

An introduction to the Kuala Lumpur real estate dataset
Processing the dataset
Defining the deep learning model
Training the deep learning model
Exercising the deep learning model

In chapter 8 we examined a set of stacks for doing deep learning with tabular data. In this chapter, we use one of these stacks, Keras, to explore some best practices for deep learning with tabular data, including how to prepare the data, how to design the model, and how to train the model. We introduce a new problem to demonstrate all these best practices: predicting whether real estate properties in Kuala Lumpur will have a price above or below the median price for the market. We selected this dataset because it is messier and more challenging to prepare than the Airbnb NYC dataset we have used so far. Consequently, we’ll be able to demonstrate a wider range of techniques for applying deep learning to tabular datasets.

If you are new to training deep learning models, the examples in this chapter will help you learn some best practices. If you already have extensive experience with defining and training deep learning architectures, this chapter could be beneficial for you as a review of principles.

9.1 Introduction to the Kuala Lumpur real estate dataset

9.2 Processing the dataset

9.2.1 Processing Bathrooms, Car Parks, Furnishing, Property Type, and Location columns

9.2.2 Processing the Price column

9.2.3 Processing the Rooms column

9.2.4 Processing the Size column

9.3 Defining the deep learning model

9.3.1 Contrasting the custom layer and Keras preprocessing layer approaches

9.3.2 Examining the code for model definition using Keras preprocessing layers

9.4 Training the deep learning model

9.4.1 Cross-validation in the training process

9.4.2 Regularization in the training process

9.4.3 Normalization in the training process

9.5 Exercising the deep learning model

9.5.1 Rationale for exercising the trained model on some new data points

9.5.2 Exercising the trained model on some new data points

Summary