3 Preparing the data, part 1: Exploring and cleansing the data

 

This chapter covers

  • Using config files in Python
  • Ingesting XLS files into Pandas dataframes and saving dataframes with pickle
  • Exploring the input dataset
  • Categorizing data into continuous, categorical, and text categories
  • Correcting gaps and errors in the dataset
  • Calculating data volume for a successful deep learning project

In this chapter, you’ll learn how to bring tabular structured data from an XLS file into your Python program and how to use the pickle facility in Python to save your data structure between Python sessions. You’ll learn how to categorize the structured data in the three categories needed by the deep learning model: continuous, categorical, and text. You will learn how to detect and deal with gaps and errors in the dataset that must be corrected before it can be used to train a deep learning model. Finally, you will get some pointers on how to assess whether a given dataset is large enough to be applicable to deep learning.

3.1 Code for exploring and cleansing the data

After you have cloned the GitHub repo associated with this book (http://mng.bz/ v95x), the code related to exploring and cleansing the data will be in the notebooks subdirectory. The next listing shows the files that contain the code described in this chapter.

Listing 3.1 Code in the repo related to exploring and cleansing the data

3.2 Using config files with Python

 
 

3.3 Ingesting XLS files into a Pandas dataframe

 
 

3.4 Using pickle to save your Pandas dataframe from one session to another

 
 
 

3.5 Exploring the data

 

3.6 Categorizing data into continuous, categorical, and text categories

 
 
 
 

3.7 Cleaning up problems in the dataset: missing data, errors, and guesses

 

3.8 Finding out how much data deep learning needs

 
 
 
 

Summary

 
 
 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest