This chapter covers:
- Undertaking an EDA to discover the statistical characteristics of data
- Exploring unstructured data properties using foundational models
- Checking the project’s ethical, privacy, and security aspects
- Building baseline models to get feedback about the potential for success
- Providing support for estimating performance of more sophisticated models
In chapter 5, we learned about the work required to get a data resource that the team can work with for modelling. Now the team can dive into the data to understand its characteristics and discern what can and what can’t be done with it. To do this, the team needs to work in a structured way, exploring the data, investigating it with a range of tools, and documenting and sharing the insights learned.
An important part of this work is for the team to look again at the ethical issues surrounding the project. This is essential because ethical concerns can shut down lines of investigation and development. It’s important to determine if that’s going to be the case before wasting the client’s money on development that will never be used or exploited.