14 Testing for data integrity and completeness
This chapter covers
- Data testing methods
- Incorporating data testing into data pipelines
- Applying the Snowflake data metric functions
- Alerting users when data metrics exceed thresholds
- Detecting data volume anomalies
Trustworthy data is the cornerstone of successful business intelligence and analytics solutions. To ensure that business users have high-quality data they can consume confidently, data engineers include data testing functionality when building data pipelines. They perform data quality tests that check for data integrity and completeness and take action when test results don’t comply with the data quality standards.
In this chapter, we will learn how to incorporate data testing into data pipelines. First, we will describe and compare various data testing methods. We will add data testing steps to the data pipeline. We will introduce the Snowflake data metric functions to monitor data quality. We will describe how to add user-defined data metric functions. Then we will design alerts that notify data engineers and business users when data quality metrics exceed the defined thresholds. Finally, we will learn how to use the Snowflake ML anomaly detection functionality to monitor data ingestion volumes and flag when data volumes deviate from the expected values.