Chapter 1. The data science process
This chapter covers
- Defining data science project roles
- Understanding the stages of a data science project
- Setting expectations for a new data science project
The data scientist is responsible for guiding a data science project from start to finish. Success in a data science project comes not from access to any one exotic tool, but from having quantifiable goals, good methodology, cross-discipline interactions, and a repeatable workflow.
This chapter walks you through what a typical data science project looks like: the kinds of problems you encounter, the types of goals you should have, the tasks that you’re likely to handle, and what sort of results are expected.
Data science is not performed in a vacuum. It’s a collaborative effort that draws on a number of roles, skills, and tools. Before we talk about the process itself, let’s look at the roles that must be filled in a successful project. Project management has been a central concern of software engineering for a long time, so we can look there for guidance. In defining the roles here, we’ve borrowed some ideas from Fredrick Brooks’s The Mythical Man-Month: Essays on Software Engineering (Addison-Wesley, 1995) “surgical team” perspective on software development and also from the agile software development paradigm.
Let’s look at a few recurring roles in a data science project in table 1.1.