3 Preparing the data, part 1: Exploring and cleansing the data

This chapter covers

Using config files in Python
Ingesting XLS files into Pandas dataframes and saving dataframes with pickle
Exploring the input dataset
Categorizing data into continuous, categorical, and text categories
Correcting gaps and errors in the dataset
Calculating data volume for a successful deep learning project

In this chapter, you’ll learn how to bring tabular structured data from an XLS file into your Python program and how to use the pickle facility in Python to save your data structure between Python sessions. You’ll learn how to categorize the structured data in the three categories needed by the deep learning model: continuous, categorical, and text. You will learn how to detect and deal with gaps and errors in the dataset that must be corrected before it can be used to train a deep learning model. Finally, you will get some pointers on how to assess whether a given dataset is large enough to be applicable to deep learning.

3.1 Code for exploring and cleansing the data

After you have cloned the GitHub repo associated with this book (http://mng.bz/ v95x), the code related to exploring and cleansing the data will be in the notebooks subdirectory. The next listing shows the files that contain the code described in this chapter.

Listing 3.1 Code in the repo related to exploring and cleansing the data

3.2 Using config files with Python

3 Preparing the data, part 1: Exploring and cleansing the data

This chapter covers

3.1 Code for exploring and cleansing the data

3.2 Using config files with Python

3.3 Ingesting XLS files into a Pandas dataframe

3.4 Using pickle to save your Pandas dataframe from one session to another

3.5 Exploring the data

3.6 Categorizing data into continuous, categorical, and text categories

3.7 Cleaning up problems in the dataset: missing data, errors, and guesses

3.8 Finding out how much data deep learning needs

Summary

3 Preparing the data, part 1: Exploring and cleansing the data

This chapter covers

3.1 Code for exploring and cleansing the data

3.2 Using config files with Python

3.3 Ingesting XLS files into a Pandas dataframe

3.4 Using pickle to save your Pandas dataframe from one session to another

3.5 Exploring the data

3.6 Categorizing data into continuous, categorical, and text categories

3.7 Cleaning up problems in the dataset: missing data, errors, and guesses

3.8 Finding out how much data deep learning needs

Summary

Unable to load book!