2014
DOI: 10.1109/tkde.2013.91
|View full text |Cite
|
Sign up to set email alerts
|

Composite Bloom Filters for Secure Record Linkage

Abstract: The process of record linkage seeks to integrate instances that correspond to the same entity. Record linkage has traditionally been performed through the comparison of identifying field values (e.g., Surname), however, when databases are maintained by disparate organizations, the disclosure of such information can breach the privacy of the corresponding individuals. Various private record linkage (PRL) methods have been developed to obscure such identifiers, but they vary widely in their ability to balance co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
77
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
3
2
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 76 publications
(84 citation statements)
references
References 36 publications
0
77
0
Order By: Relevance
“…Therefore, instead of deleting characters, sampling can be used. Durham (2012) published a variation of CLKs, denoted by Durham et al (2013) as composite Bloom filters. Bit positions from separate Bloom filters for each identifier are sampled.…”
Section: Sampling Bits For Composite Bloom Filtersmentioning
confidence: 99%
“…Therefore, instead of deleting characters, sampling can be used. Durham (2012) published a variation of CLKs, denoted by Durham et al (2013) as composite Bloom filters. Bit positions from separate Bloom filters for each identifier are sampled.…”
Section: Sampling Bits For Composite Bloom Filtersmentioning
confidence: 99%
“…Durham et al [75] proposed a protocol for probabilistic record linkage based on a Bloom filter with the objective of avoiding possible frequency-based cryptanalysis by encoding each identifier of a record with a separate Bloom filter. They introduced a method for encoding the set of identifiers of a record as a Bloom filter.…”
Section: Privacy-preserving Record Linkagementioning
confidence: 99%
“…They introduced a method for encoding the set of identifiers of a record as a Bloom filter. In contrast to the best practice protocol, the PPRL protocols [73,75] do not allow manual review, as the identifiers are encrypted.…”
Section: Privacy-preserving Record Linkagementioning
confidence: 99%
“…Unfortunately, the use of basic bloom fil-ters is still being proposed in the medical informatics community as a privacy preserving method [34,38]. In order to address frequency attacks on basic bloom filters, Durham et al propose combining multiple Bloom filters by using a statistically informed method of sampling [11]. The method makes frequency attacks difficult, but requires the tuning of a security parameter that can affect linkage results.…”
Section: Related Workmentioning
confidence: 99%
“…British Columbia voter's list; Adly used datasets with 4,000, 10,000 and 20,000 records, generated by sampling from the list; manually controlled and identified the percentage of similar records between each set pair Schnell et al [39] Two German private administration databases, each with about 15,000 records Durham et al [10] Created 100 datasets with 1,000 records in each from the identifiers and demographics within the patient records in the electronic medical record system of the Vanderbilt University Medical Center; data sets to link to are generated from these 100 sets using a "data corrupter" DuVall et al [14] Used the enterprise data warehouse of the University of Utah Health Sciences Center; 118,404 known duplicate record pairs, identified using the Utah Population Database Karakasidis et al [25] Used the FEBRL synthetic data generator [6] for performance and accuracy experiments Kuzu et al [26] A sample of 20,000 records from the North Carolina voter's registration list; to evaluate the effect of typographical and semantic name errors, the sample was synthetically corrupted Durham et al [11] Ten independent samples of 100,000 records from the North Carolina voter's registration list; each sample was independently corrupted to generate samples at the second party Dusetzina et al [12] Individuals in the North Carolina Central Cancer Registry (NCCCR) diagnosed with colon cancer linked to enrollment and claims data for beneficiaries in privately insured health plans in North Carolina; 104,360 record pairs Gruenheid et al [21] Cora dataset; Biz dataset consisting of multiple versions of a business records dataset, each with 4,892 records Randall et al [34] approximately 3.5 × 10 9 record pairs from ten years of the West Australian Hospital Admissions data; approximately 16 × 10 9 record pairs from ten years of the New South Wales admitted patient data Schmidlin et al [38] No experimental evaluation; timing estimated for a linkage attempt with 100,000 records in one data set and 50,000 records in another…”
Section: Linking Health Records For Federated Query Processingmentioning
confidence: 99%