2019
DOI: 10.1038/sdata.2018.298
|View full text |Cite
|
Sign up to set email alerts
|

Probabilistic record linkage of de-identified research datasets with discrepancies using diagnosis codes

Abstract: We develop an algorithm for probabilistic linkage of de-identified research datasets at the patient level, when only diagnosis codes with discrepancies and no personal health identifiers such as name or date of birth are available. It relies on Bayesian modelling of binarized diagnosis codes, and provides a posterior probability of matching for each patient pair, while considering all the data at once. Both in our simulation study (using an administrative claims dataset for data generation) and in two real use… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
24
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 29 publications
(26 citation statements)
references
References 28 publications
0
24
0
1
Order By: Relevance
“…Visualization‐based interfaces can play a key role in helping data owners, subjects, custodians, and consumers dynamically evaluate the disclosure risks of shared data. For data owners or custodians, visual interfaces [GHK*16, CRVFS15] can help communicate privacy risks by suggesting non‐obvious, probabilistic linkages [HWL*19], let them dynamically evaluate the trade‐offs among data utility and privacy risks [ADSZ*19] by visualizing privacy outcomes from new and evolving metrics [JSH*17], and make more confident decisions regarding data sharing [BVM*17].…”
Section: Gaps and Research Opportunitiesmentioning
confidence: 99%
“…Visualization‐based interfaces can play a key role in helping data owners, subjects, custodians, and consumers dynamically evaluate the disclosure risks of shared data. For data owners or custodians, visual interfaces [GHK*16, CRVFS15] can help communicate privacy risks by suggesting non‐obvious, probabilistic linkages [HWL*19], let them dynamically evaluate the trade‐offs among data utility and privacy risks [ADSZ*19] by visualizing privacy outcomes from new and evolving metrics [JSH*17], and make more confident decisions regarding data sharing [BVM*17].…”
Section: Gaps and Research Opportunitiesmentioning
confidence: 99%
“…A particularly interesting method has been advanced by Hejblum et al, in Boston 85 . It is clear that sufficient clinical details (such as diagnoses with encounter dates) may be enough to identify a subject uniquely, or very nearly so, in two coherent data sets.…”
Section: Resultsmentioning
confidence: 99%
“…Similar efforts have been employed to create virtual patients with specific measurements (i.e. glucose measurements) using mathematical models [24] or using very specific features [25] (non-demographics). The methodology proposed for our virtual population creation has similarities with the concept of Synthea, however, in our approach, a computational model to create additional virtual geometries has been also incorporated in our methodology.…”
Section: Discussionmentioning
confidence: 99%