3 Acquisition, storage, and retrieval

 

This chapter covers

  • Structuring data pipelines around a design pattern called the core data representation
  • Importing and exporting JSON and CSV data from text files and REST APIs
  • Importing and exporting data with MySQL and MongoDB databases
  • Creating flexible pipelines to convert data between different formats

Chapter 3 covers a topic that’s crucial to the data-wrangling process: the ability to acquire data from somewhere and then store it locally so we can work with it efficiently and effectively.

Initially, we must import our data from somewhere: this is acquisition. We’ll probably then export the data to a database to make it convenient to work with: this is storage. We might then export the data to various other formats for reporting, sharing, or backup. Ultimately, we must be able to access our data to work with it: this is retrieval.

In chapter 1 we looked at an example of the data-wrangling process where data was imported from a MySQL database and exported to a MongoDB database. This is one possible scenario. How you work in any given situation depends on how the data is delivered to you, the requirements of your project, and the data formats and storage mechanisms that you choose to work with.

3.1 Building out your toolkit

3.2 Getting the code and data

3.3 The core data representation

3.3.1 The earthquakes website

3.3.2 Data formats covered

3.3.3 Power and flexibility

3.4 Importing data

3.4.1 Loading data from text files

3.4.2 Loading data from a REST API

3.4.3 Parsing JSON text data

3.4.4 Parsing CSV text data

3.4.5 Importing data from databases

3.4.6 Importing data from MongoDB

3.4.7 Importing data from MySQL

3.5 Exporting data

3.5.1 You need data to export!

3.5.2 Exporting data to text files

3.5.3 Exporting data to JSON text files

3.5.4 Exporting data to CSV text files

3.5.5 Exporting data to a database

3.5.6 Exporting data to MongoDB

3.5.7 Exporting data to MySQL

3.6 Building complete data conversions

3.7 Expanding the process

Summary