3 Acquisition, storage, and retrieval
This chapter covers
- Structuring data pipelines around a design pattern called the core data representation
- Importing and exporting JSON and CSV data from text files and REST APIs
- Importing and exporting data with MySQL and MongoDB databases
- Creating flexible pipelines to convert data between different formats
Chapter 3 covers a topic that’s crucial to the data-wrangling process: the ability to acquire data from somewhere and then store it locally so we can work with it efficiently and effectively.
Initially, we must import our data from somewhere: this is acquisition. We’ll probably then export the data to a database to make it convenient to work with: this is storage. We might then export the data to various other formats for reporting, sharing, or backup. Ultimately, we must be able to access our data to work with it: this is retrieval.
In chapter 1 we looked at an example of the data-wrangling process where data was imported from a MySQL database and exported to a MongoDB database. This is one possible scenario. How you work in any given situation depends on how the data is delivered to you, the requirements of your project, and the data formats and storage mechanisms that you choose to work with.