Chapter 2. The data science process
This chapter covers
- Understanding the flow of a data science process
- Discussing the steps in a data science process
The goal of this chapter is to give an overview of the data science process without diving into big data yet. You’ll learn how to work with big data sets, streaming data, and text data in subsequent chapters.
Following a structured approach to data science helps you to maximize your chances of success in a data science project at the lowest cost. It also makes it possible to take up a project as a team, with each team member focusing on what they do best. Take care, however: this approach may not be suitable for every type of project or be the only way to do good data science.
The typical data science process consists of six steps through which you’ll iterate, as shown in figure 2.1.
Figure 2.1 summarizes the data science process and shows the main steps and actions you’ll take during a project. The following list is a short introduction; each of the steps will be discussed in greater depth throughout this chapter.
1. The first step of this process is setting a research goal. The main purpose here is making sure all the stakeholders understand the what, how, and why of the project. In every serious project this will result in a project charter.