BackgroundCombining multiple databases with disjunctive or additional information on the same person is occurring increasingly throughout research. If unique identification numbers for these individuals are not available, probabilistic record linkage is used for the identification of matching record pairs. In many applications, identifiers have to be encrypted due to privacy concerns.MethodsA new protocol for privacy-preserving record linkage with encrypted identifiers allowing for errors in identifiers has been developed. The protocol is based on Bloom filters on q-grams of identifiers.ResultsTests on simulated and actual databases yield linkage results comparable to non-encrypted identifiers and superior to results from phonetic encodings.ConclusionWe proposed a protocol for privacy-preserving record linkage with encrypted identifiers allowing for errors in identifiers. Since the protocol can be easily enhanced and has a low computational burden, the protocol might be useful for many applications requiring privacy-preserving record linkage.
In panel studies on sensitive topics, respondent-generated identification codes are often used to link records across surveys. However, usually a substantial number of cases are lost due to the codes. These losses may cause biased estimates. Using more components and linking the codes by the Levenshtein string distance function will reduce the losses. In a simulation study and two field experiments, the proposed procedure outperforms the methods previously applied.
The evaluation of the German Mammography Screening Program requires record linkage with data from cancer registries in order to measure the number of false-negative mammograms and interval cancers. This study aims at evaluating the performance of the established linkage method based on identifiers encrypted by the standard procedure of the German cancer registries. In addition, the results are compared with an alternative method based on plain text identifiers. A total of 16,572 records from the Bremen Mammography Screening Pilot Study were linked with data from the Bremen Cancer Registry. Based on a gold standard set of matching record pairs, homonym and synonym errors were determined. Given the customary threshold value in cancer registries, the plain text method showed a lower rate of synonym errors (2.1-5.1%) and a lower rate of homonym errors (0.01-0.15%). As 10.4 million women are invited to take part biennially in screening, the corresponding figures would be 3,237 homonym errors for the standard procedure and 294 using the plain text method provided equivalent conditions. The 11-fold increase in the homonym error rate documents the trade-off for better data protection using encrypted data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.