2014
DOI: 10.1016/j.jbi.2013.12.003
|View full text |Cite
|
Sign up to set email alerts
|

Privacy-preserving record linkage on large real world datasets

Abstract: Record linkage typically involves the use of dedicated linkage units who are supplied with personally identifying information to determine individuals from within and across datasets. The personally identifying information supplied to linkage units is separated from clinical information prior to release by data custodians. While this substantially reduces the risk of disclosure of sensitive information, some residual risks still exist and remain a concern for some custodians. In this paper we trial a method of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
100
0
1

Year Published

2015
2015
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 105 publications
(121 citation statements)
references
References 19 publications
1
100
0
1
Order By: Relevance
“…Bloom filter-based record linkage has been used in real-world medical applications, such as in Brazil (Napoleão Rocha 2013), Germany (Schnell 2014) and Switzerland (Kuehni et al 2011). The largest application so far has been an Australian study (Randall et al 2014). Here, healthcare data with more than 26 million records have been used.…”
Section: { S Sm M I It T H H }mentioning
confidence: 99%
“…Bloom filter-based record linkage has been used in real-world medical applications, such as in Brazil (Napoleão Rocha 2013), Germany (Schnell 2014) and Switzerland (Kuehni et al 2011). The largest application so far has been an Australian study (Randall et al 2014). Here, healthcare data with more than 26 million records have been used.…”
Section: { S Sm M I It T H H }mentioning
confidence: 99%
“…Recently, Neidermeyer et al proposed an easier attack and demonstrated that Bloom filter encodings can be broken without the need for high computational resources [31]. Unfortunately, the use of basic bloom fil-ters is still being proposed in the medical informatics community as a privacy preserving method [34,38]. In order to address frequency attacks on basic bloom filters, Durham et al propose combining multiple Bloom filters by using a statistically informed method of sampling [11].…”
Section: Related Workmentioning
confidence: 99%
“…British Columbia voter's list; Adly used datasets with 4,000, 10,000 and 20,000 records, generated by sampling from the list; manually controlled and identified the percentage of similar records between each set pair Schnell et al [39] Two German private administration databases, each with about 15,000 records Durham et al [10] Created 100 datasets with 1,000 records in each from the identifiers and demographics within the patient records in the electronic medical record system of the Vanderbilt University Medical Center; data sets to link to are generated from these 100 sets using a "data corrupter" DuVall et al [14] Used the enterprise data warehouse of the University of Utah Health Sciences Center; 118,404 known duplicate record pairs, identified using the Utah Population Database Karakasidis et al [25] Used the FEBRL synthetic data generator [6] for performance and accuracy experiments Kuzu et al [26] A sample of 20,000 records from the North Carolina voter's registration list; to evaluate the effect of typographical and semantic name errors, the sample was synthetically corrupted Durham et al [11] Ten independent samples of 100,000 records from the North Carolina voter's registration list; each sample was independently corrupted to generate samples at the second party Dusetzina et al [12] Individuals in the North Carolina Central Cancer Registry (NCCCR) diagnosed with colon cancer linked to enrollment and claims data for beneficiaries in privately insured health plans in North Carolina; 104,360 record pairs Gruenheid et al [21] Cora dataset; Biz dataset consisting of multiple versions of a business records dataset, each with 4,892 records Randall et al [34] approximately 3.5 × 10 9 record pairs from ten years of the West Australian Hospital Admissions data; approximately 16 × 10 9 record pairs from ten years of the New South Wales admitted patient data Schmidlin et al [38] No experimental evaluation; timing estimated for a linkage attempt with 100,000 records in one data set and 50,000 records in another…”
Section: Linking Health Records For Federated Query Processingmentioning
confidence: 99%
“…Unfortunately, records to be linked across different datasets often lack unique 15 identifiers for performing such an identifying and aggregating process [1]. To overcome this problem, many techniques have been developed for record linkage over the past decade [5] in various applications.…”
Section: Introductionmentioning
confidence: 99%