2 Using generative AI to ensure sufficient data quality
This chapter covers
- Best practices for ensuring high quality of data
- Using generative AI to prepare a data cleaning protocol
- Evaluating data content quality
- Dealing with data errors
- Investigating unclear data
In MS Excel, you can calculate the trend line and standard deviation of a sample on the basis of just two data points. Clearly, such “data analysis” is meaningless. This chapter will help you focus your efforts on things you should do with data, rather than just expand on what you can do with it. It explains the necessary background for any analysis you may wish to perform. You will learn about best practices and non-negotiable rules, ensuring that your conclusions are related to the business activities you’re analyzing, rather than to flaws in the underlying data.
You’ll develop a structured approach to quality assessment and assurance, you’ll purge your data of artifacts, you’ll identify the blind spots, and you’ll learn to think about the benefits and risks of guesstimating missing pieces. Finally, you’ll learn to look at the collected data from a new perspective—the perspective of its usefulness for the process of analysis.