12 Examining and testing naturally occurring number sequences

This chapter covers

Benford’s law and naturally occurring number sequences
Chi-square goodness of fit test
Mean absolute deviation
Distortion factor and Z-statistic
Mantissa statistics

Numeric data sets that follow a Benford distribution exhibit a much higher frequency of smaller leading digits than larger leading digits. The phenomenon is mostly prevalent in numeric data that spans several orders of magnitude, and is therefore best represented on a logarithmic, rather than linear, scale.

Fraudsters oftentimes make the mistake of transmuting leading 1s and 2s to 8s and 9s on invoices, expenses, tax returns, and the like in order to maximize gains against their risks, on the assumption that, regardless of the data set, 8s and 9s are just as probable as 1s and 2s or, when randomness prevails, larger digits should sometimes be expected to actually occur more frequently than smaller digits. In fact, Benford’s law is most often used in fraud detection; serious deviations from a Benford distribution may be an indication of fraudulent activity and therefore reason for further investigation. But there are other applications as well: data integrity, economic data analysis, scientific research, digital forensics, and population studies, just to name a few. Which is to otherwise state that Benford’s law is a valuable tool for uncovering potential anomalies in numeric data sets across several domains.

12.1 Benford’s law explained

12.2 Naturally occurring number sequences

12.3 Uniform and random distributions

12.3.1 Uniform distribution

12.3.2 Random distribution

12.3.3 Plotted distributions

12.4 Examples

12.4.1 Street addresses

12.4.2 World population figures

12.4.3 Payment amounts

12.5 Validating Benford’s law

12.5.1 Chi-square test

12.5.2 Mean absolute deviation

12.5.3 Distortion factor and Z-statistic

12.5.4 Mantissa statistics

12.6 Summary