“…However, until recently few cross-references could be found between the statistical and the computer science community. While statisticians and epidemiologists speak of record or data linkage [17], the computer science and database communities often refer to the same process as data or field matching, data scrubbing, data cleaning [18,35], data cleansing [28], preprocessing, duplicate detection [5], entity uncertainty or as the object identity problem. In commercial processing of customer databases or business mailing lists, data linkage is sometimes called merge/purge processing [23], data integration [11], list washing or ETL (extraction, transformation and loading).…”