The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2010
DOI: 10.1136/jamia.2009.002212
|View full text |Cite
|
Sign up to set email alerts
|

Effects of personal identifier resynthesis on clinical text de-identification

Abstract: The de-identification tool achieves high accuracy when training and test sets are homogeneous (ie, both real or resynthesized records). The resynthesis component regularizes the data to make them less "realistic," resulting in loss of performance particularly when training on resynthesized data and testing on real data.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
26
2
1

Year Published

2010
2010
2021
2021

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 41 publications
(30 citation statements)
references
References 15 publications
1
26
2
1
Order By: Relevance
“…There are variations in resynthesis processes used to replace PHI in corpora [3,11,12]. For PHI involving numerical values such as dates, ids and phone numbers, approaches are usually based on digit replacement strategies.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…There are variations in resynthesis processes used to replace PHI in corpora [3,11,12]. For PHI involving numerical values such as dates, ids and phone numbers, approaches are usually based on digit replacement strategies.…”
Section: Introductionmentioning
confidence: 99%
“…Approaches for names are less similar. Uzuner et al [3] focused on generating a majority of out-of-vocabulary names, while Yeniterzi et al [11] used names from a dictionary. In their work on Swedish clinical notes, Alfalahi et al [12] used names from dictionaries while also introducing some letter variations to allow for misspelled names, and kept the gender intact in first names.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Yeniterzi et al [19] evaluated the effectiveness and examined possible bias introduced by re-synthesis on de-identification software. The main research motivation was that real medical records for the development and evaluation of de-identification software are hardly available.…”
Section: Related Workmentioning
confidence: 99%
“…[5], [22]). In [20], pseudonymization is achieved by first separating the identification data from the anamnesis data which is then stored in a separate database referenced with so called unique data identification codes (DIC) as pseudonyms.…”
Section: Pseudonymizationmentioning
confidence: 99%