“…Most research uses supervised machine learning techniques to classify addresses, but with the exception of Akcora et al [1] and Paquet-Clouston et al [42], researchers do not have access to quality labelled datasets. Instead, researchers use synthetic and fake data: Ashfaq et al [2] use a synthetic dataset; Rabieinejad et al [44] generate fake labels; Dahiya et al [11] use an unverified Kaggle dataset; Pham and Lee [43] use unverified labels for "30 thieves" of unknown provenance; Sankar Roy et al [47] use the same Pham and Lee dataset. Or researchers use very simple heuristics (such as node degree patterns, see Weber et al [55] and Lorenz et al [34] who use the Weber et al dataset) or slightly more complex heuristics (like motifs, see Wu et al [58]) and assume that such patterns are evidence of complex criminal behaviour, like money laundering.…”