Part 1. Preparing and gathering data and knowledge

 

The process of data science begins with preparation. You need to establish what you know, what you have, what you can get, where you are, and where you would like to be. This last one is of utmost importance; a project in data science needs to have a purpose and corresponding goals. Only when you have well-defined goals can you begin to survey the available resources and all the possibilities for moving toward those goals.

Part 1 of this book begins with a chapter discussing my process-oriented perspective of data science projects. After that, we move along to the deliberate and important step of setting good goals for the project. The subsequent three chapters cover the three most important data-centric steps of the process: exploration, wrangling, and assessment. At the end of this part, you’ll be intimately familiar with the data you have and relevant data you can get. More important, you’ll know if and how it can help you achieve the goals of the project.