5 Unusual data sources

 

This chapter covers

  • Thinking of data beyond what is available in structured formats
  • Using all the data sources available to you creatively, regardless of their format
  • Navigating the tradeoff between time spent and value added when working with additional data sources

Most datasets you will encounter in your career are not as clean and structured as those provided in a learning environment. The reality is that it’s often the analyst who must search for the right data, which may be hidden in complicated spreadsheets or hidden even further in unstructured, nontraditional data sources. This chapter is about practicing the creativity of identifying and using novel and unstructured data sources to answer interesting analytical questions.

5.1 Identifying novel data sources

5.1.1 Considerations for using new datasets

5.2 Project 4: Analyzing film industry trends using PDF data

5.2.1 Problem statement

5.2.2 Data dictionary

5.2.3 Desired outcomes

5.2.4 Required tools

5.3 Applying the results-driven method to extracting data from PDFs

5.4 An example solution: Effects of the COVID-19 lockdown periods on the film industry

5.4.1 Inspecting the available data

5.4.2 Extracting data from PDFs

5.4.3 Analyzing the data extracted from PDFs

5.4.4 Project conclusions and recommendations

5.5 Closing thoughts on exploring novel data sources

5.5.1 Skills for exploring unusual data sources for any project

Summary