10 Incorporating external data into analyses
This chapter covers
- The value of third-party and external data sources
- Retrieving and processing data from an API
- Web scraping and mining unstructured data
- Tapping into public sources of data
Think of the datasets we’ve used in this book. We’ve worked with several sources of information, often combining some of them into a single resource we can use to answer questions. Most notably,
- We explored a dataset containing the number of reported rat sightings in New York City.
- We tracked historical weather information in New York City and Boston.
- We analyzed customer login and transaction data for various hypothetical companies.
None of these datasets fell out of the sky or were readily available for us to download in the exact format necessary to cover each topic in this book. Multiple approaches to retrieving, structuring, and creating these datasets were used to prepare them for analysis.
This chapter will delve into common methods used to retrieve data from sources such as APIs, websites, and public databases. We’ll explore common formats in which your data can be retrieved, ensuring you can extract the information relevant to your analytical needs. Beyond running Python code associated, we’ll be focusing on the mindset an analyst needs to strategically seek out information that enriches the data available at their organization.