15 Hooking monitoring into a feedback process

 

This chapter covers:

  • The difference between validity and verification
  • Using a schema to validate data
  • Providing feedback quickly to improve usability and quality

As prime minister of the United Kingdom, David Cameron published a letter to all government departments in May 2010, requesting that they publish spending over a specific amount that varied by the type of the department: local government departments had to publish transactions above £500, while central government departments had to publish spending over £25,000.

Her Majesty’s Treasury, the British government department responsible for public finance and economic policy, published a description of how the data should be published as a CSV with a specific set of fields with specific field names. This was the government standard that all the different departments had to comply with.

When all the expenditure data was collected, a total of almost 1500 data files, they were compared with the schema published by Her Majesty’s Treasury to see how many of them actually conformed to the published standard. The number of completely valid files rounded to 0%. Only a handful of the published data files were valid. Anyone who wanted to analyze the UK expenditure would have had to write specific processing routines, clean the CSV files, and do a lot of other work just to be able to use the expenditure data in their analysis.

15.1  TL;DR

15.2  Validity

15.2.1  Schemas

15.2.2  CSV schemas

15.2.3  Using data packages

15.3  Monitoring validity

15.3.1  Create the quality control

15.3.2  Try out the quality control

15.3.3  Provide feedback to those who can improve the data

15.4  Summary