2 Finding, collecting, and exploring tabular datasets
This chapter covers
- Examining row and column characteristics in a tabular dataset
- Pointing out possible pathologies and remedies for tabular datasets
- Finding tabular data externally on the Internet and internally in organizations
- Exploring data to solve common problems in tabular data
In chapter 1, we have introduced the differences between deep learning and machine learning approaches in tabular data problems. In this new chapter, we will put models apart for a while and begin getting a more hands-on and in-depth look at what tabular datasets are. We will focus on how tabular data is structured and the common characteristics of all tabular problems. Such matters because tabular data structure and characteristics determine the way and the tools of the trade that you will successfully use in your projects.
Our first topic will be finding out what is in common between tabular datasets, which makes it possible to write a book that generally discusses tabular data and models. The commonalities are found in how data is structured, not in the different and varied content of the features, which is a matter of domain knowledge. Hence, we will glance at how rows and columns organize a tabular dataset, the type of data they contain from a data type point of view, and what problems may arise when you get tabular data from the Web or the data repositories in your organization.