This chapter covers
- Row and column characteristics in a tabular dataset
- Possible pathologies and remedies for tabular datasets
- Finding tabular data externally on the internet and internally in organizations
- Exploring data to solve common problems in tabular data
Tabular data may consist of practically anything—from low-level scientific research to consumer behavior on a website to the statistics in your fantasy sports league. In the end, though, the commonalities in tabular data prevail over differences, and you can achieve most of your data analysis job just by applying standard approaches and tools even without a lot of domain expertise.
In this chapter, we’ll look at how to gather and prepare tabular datasets. We’ll also take on a practical data analysis exploration that shows the steps you can take to look at data from different viewpoints: by rows, by columns, under the light of the relationship between features, and considering their overall distribution in the dataset. For that example, we will use a simple toy dataset, the Auto MPG Data Set, a dataset freely available on the UCI Machine Learning website (https://archive.ics.uci.edu/dataset/9/auto+mpg).