
foreword
In the modern world, data is everywhere. Applications run by governments and companies collect data about the world and about our actions. We walk around with smartphones, which constantly collect data about our movements, purchases, and preferences, and then share that information with a wide variety of companies.
The good news is that this data makes it easier than ever to ask interesting questions about the world, ourselves, and our customers, and to get coherent answers. The bad news is that you need to find the data that will allow you to solve the problem, which isn’t trivial. You then need to clean that data and modify it to suit your purposes. Only when you have wrestled the data into submission can you finally start to perform analysis. And then, when you finally have answered your questions, you have to decide how to present your analysis to others.
In other words, analyzing data involves much more than just analyzing it. Most of your time in a data project will be spent searching, retrieving, cleaning, editing, and producing reports. Each of these steps, in and of itself, can be quite frustrating, and they require practice and understanding.
But for a beginner, it’s worse than that because it’s not clear where to start. Even if you have lots of experience with Python and pandas
, that doesn’t mean you know how to solve problems—much as knowing how to use a hammer and screwdriver doesn’t necessarily make you qualified to take on a carpentry project.