12 Examining and testing naturally occurring number sequences
This chapter covers
- Benford’s law and naturally occurring number sequences
- Chi-square goodness of fit test
- Mean absolute deviation
- Distortion factor and Z-statistic
- Mantissa statistics
Numeric data sets that follow a Benford distribution exhibit a much higher frequency of smaller leading digits than larger leading digits. The phenomenon is mostly prevalent in numeric data that spans several orders of magnitude, and is therefore best represented on a logarithmic, rather than linear, scale.
Fraudsters oftentimes make the mistake of transmuting leading 1s and 2s to 8s and 9s on invoices, expenses, tax returns, and the like in order to maximize gains against their risks, on the assumption that, regardless of the data set, 8s and 9s are just as probable as 1s and 2s or, when randomness prevails, larger digits should sometimes be expected to actually occur more frequently than smaller digits. In fact, Benford’s law is most often used in fraud detection; serious deviations from a Benford distribution may be an indication of fraudulent activity and therefore reason for further investigation. But there are other applications as well: data integrity, economic data analysis, scientific research, digital forensics, and population studies, just to name a few. Which is to otherwise state that Benford’s law is a valuable tool for uncovering potential anomalies in numeric data sets across several domains.