Chapter 1 The data science process
Chapter 2 from Introducing Data Science by Davy Cielen, Arno D. B. Meysman, and Mohamed Ali.
This chapter covers:
Understanding the flow of a data science process
Discussing the steps in a data science process
The goal of this chapter is to give an overview of the data science process without diving into big data yet. You’ll learn how to work with big data sets, streaming data, and text data in subsequent chapters.
2.1 Overview of the data science process
Following a structured approach to data science helps you to maximize your chances of success in a data science project at the lowest cost. It also makes it possible to take up a project as a team, with each team member focusing on what they do best. Take care, however: this approach may not be suitable for every type of project or be the only way to do good data science.
The typical data science process consists of six steps through which you’ll iterate, as shown in figure 2.1.
Figure 2.1 The six steps of the data science process
Figure 2.1 summarizes the data science process and shows the main steps and actions you’ll take during a project. The following list is a short introduction; each of the steps will be discussed in greater depth throughout this chapter.