14 Testing for data integrity and completeness

 

This chapter covers

  • Data testing methods
  • Incorporating data testing into data pipelines
  • Applying the Snowflake data metric functions
  • Alerting users when data metrics exceed thresholds
  • Detecting data volume anomalies

Trustworthy data is the cornerstone of successful business intelligence and analytics solutions. To ensure that business users have high-quality data they can consume confidently, data engineers include data testing functionality when building data pipelines. They perform data quality tests that check for data integrity and completeness and take action when test results don’t comply with the data quality standards.

In this chapter, we will learn how to incorporate data testing into data pipelines. First, we will describe and compare various data testing methods. We will add data testing steps to the data pipeline. We will introduce the Snowflake data metric functions to monitor data quality. We will describe how to add user-defined data metric functions. Then we will design alerts that notify data engineers and business users when data quality metrics exceed the defined thresholds. Finally, we will learn how to use the Snowflake ML anomaly detection functionality to monitor data ingestion volumes and flag when data volumes deviate from the expected values.

14.1 Data testing methods

14.1.1 Performing data testing as steps in the pipeline

14.1.2 Performing data testing independently of the pipeline

14.2 Incorporating data testing steps in the pipeline

14.2.1 Constructing the partner data quality task

14.2.2 Constructing the product data quality task

14.2.3 Executing the pipeline with the data testing tasks

14.3 Applying the Snowflake data metric functions

14.3.1 System-defined data metric functions

14.3.2 User-defined data metric functions

14.3.3 Viewing data metric function details

14.4 Alerting users when data metrics exceed thresholds

14.5 Detecting data volume anomalies

14.5.1 Generating random data

14.5.2 Displaying data as a line chart in Snowsight

14.5.3 Working with the anomaly detection model

Summary