Content area
Full Text
An easy-to-apply tool associated with data quality is Digital Analysis, based on Benford's Law. (See glossary on page 23.) Nigrini1 describes how Benford had noticed that the front pages from books of logarithms used by scientists and engineers were far more worn than those in the back, and speculated that this was because numbers starting with 1, 2 or 3, which appeared at the front, were more common in nature than those with initial digits of 4 to 9. Benford2 specifically observed that when data sets of various types are ranked from smallest to largest, the shape approximates a geometric sequence in which each member in the series is larger than the previous one by a fixed ratio. This geometric (logarithmic progression) assumption is the basis of what is known as Benford's Law, which observes that nature counts geometrically and appears to both build and function accordingly.The concept of the approach is that given certain conditions, the distribution of digits in large data sets is expected to compare with that distribution defined by Benford's Law. Specifically, data sets should conform with Benford Law if they describe sizes of similar phenomena, have no built-in maximum or minimum values, are not comprised of assigned numbers and have more small items than big items.
Overview of Related Research
A non-mathematical review of Benford's Law with intuitive explanations of "the first digit phenomenon" is provided by Raimi.3 The tool has been described as useful for checking the reasonableness of forecasts,4 assessing managers' tendency to rounds and detecting fraudsters invented numbers.6 Mathematics literature has presented a proof of Benford's Law7 and has demonstrated that Benford's Law both has similar properties to the central limit theorem in that it is the central limit theorem for digits under multiplicative operations8 and random samples selected from distributions chosen randomly converge to Benford's distribution.9 Benford's Law is scale invariant.10 Other related literature spans the past two centuries.11 This "primer" builds on such prior research and is intended as an introduction using government data. From this foundation, readers are encouraged to explore more extensive cases, software and related literature.12
Method
To assess the likely applicability of Benford's Law to a given data set, compute descriptive statistics, comparing the results to ensure that the mean...