10 Incorporating External Data into Analyses
This chapter covers
- The value of third-party and external data sources
- Retrieving and processing data from an API
- Web scraping and mining unstructured data
- Tapping into public sources of data
Think of the datasets we’ve used in this book—we compiled information about daily rat sightings in New York City, and daily weather information in New York City and Boston, and we’ve worked with datasets looking at information about customers, transactions, and production costs. None of these datasets fell out of the sky or were readily available for download to use in this book. Multiple approaches to retrieving, structuring, and creating these datasets were used to prepare them for analysis.
Figure 10.1 Each of these datasets was retrieved/constructed for a specific analytical purpose.
Many data sources you access for analysis will be in a raw, unprocessed state that requires a lot of effort for data teams to make suitable for analytic purposes. Unless your job only involves working with the organization’s highly curated data warehouse, you will eventually need to take part in the data retrieval and structuring process to derive value from the information. Though it’s rarely highlighted in analytics roles you will apply for, data retrieval and structuring are the backbone of any meaningful analysis.