Part 1. The data science process
There are a lot of factors to consider in extracting meaningful insights from data. Among other things, you need to know what sorts of questions you hope to answer, how you are going to go about it, what resources and how much time you’ll need, and how you will measure the success of your project. Once you have answered those questions, you can consider what data you need, as well as where and how you’ll get that data and what sort of preparation and cleaning it will need. Then after exploring the data comes the actual data modelling, arguably the “science” part of “data science.” Finally, you’re likely to present your results and possibly productionize your process.
Being able to think about data science with a framework like the above increases your chances of getting worthwhile results from the time and effort you spend on the project. This chapter, “The data science process” from Introducing Data Science, by Davy Cielen, Arno D. B. Meysman, and Mohamed Ali, lays out the steps in a mature data science process. While you don’t need to be strictly bound by these steps, and may spend less time or even ignore some of them, depending on you project, this framework will help keep your project on track.