5 Diving into the problem
This chapter covers:
- Getting and verifying access to the data
- Revisiting, verifying, and refining business understanding
- Developing UX and model utilization concepts
- Getting the versioning and pipelining system in place and working
- Building the initial pipelines to deliver a data set to the team
- Starting to build data tests to make your pipelines robust
In sprint 1, the team puts in place and starts using the infrastructure to support the delivery project, then they open the data that’s going to underpin the ML project. In order to crack the data open, they will use the infrastructure (particularly the pipelines and testing systems) that they construct.
The sprint 1 backlog provides tasks that are described in this chapter (S1.1 - S1.4) and in chapter 6 (S1.5 - S1.7). With sprint 1, you prepare for the core ML activity of creating and evaluating useful models using ML algorithms. The work is to dig deeper into the data resources and develop the team’s expertise and capability to use them for modeling. You also need to build the supporting infrastructure that lifts and shifts the data from where it’s resting to where you need it.