This chapter covers
- Using config files in Python
- Ingesting XLS files into Pandas dataframes and saving dataframes with pickle
- Exploring the input dataset
- Categorizing data into continuous, categorical, and text categories
- Correcting gaps and errors in the dataset
- Calculating data volume for a successful deep learning project
In this chapter, you’ll learn how to bring tabular structured data from an XLS file into your Python program and how to use the pickle facility in Python to save your data structure between Python sessions. You’ll learn how to categorize the structured data in the three categories needed by the deep learning model: continuous, categorical, and text. You will learn how to detect and deal with gaps and errors in the dataset that must be corrected before it can be used to train a deep learning model. Finally, you will get some pointers on how to assess whether a given dataset is large enough to be applicable to deep learning.
After you have cloned the GitHub repo associated with this book (http://mng.bz/ v95x), the code related to exploring and cleansing the data will be in the notebooks subdirectory. The next listing shows the files that contain the code described in this chapter.