2012
DOI: 10.1186/1471-2288-12-109
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating current automatic de-identification methods with Veteran’s health administration clinical documents

Abstract: BackgroundThe increased use and adoption of Electronic Health Records (EHR) causes a tremendous growth in digital information useful for clinicians, researchers and many other operational purposes. However, this information is rich in Protected Health Information (PHI), which severely restricts its access and possible uses. A number of investigators have developed methods for automatically de-identifying EHR documents by removing PHI, as specified in the Health Insurance Portability and Accountability Act “Saf… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
27
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 39 publications
(28 citation statements)
references
References 12 publications
(20 reference statements)
0
27
0
Order By: Relevance
“…As such, in our current evaluation we could not thoroughly evaluate our 'Text De-identifier' module. Due to the various complexities involved with de-identifying free text, this module may need some improvement when working with narrative, free-text reports; perhaps reusing some of the techniques discussed by other researchers [15,18] may help achieve a high degree of accuracy. A third limitation is that for zip codes, we retained only the first three characters and replaced the rest with zeros; however, for zip code de-identification to be fully compliant with HIPAA regulations, it needs to meet additional criteria where according to the current publicly available data from the Bureau of the Census, the geographic unit formed by combining all zip codes with the same three initial digits should contain more than 20,000 people.…”
Section: Discussionmentioning
confidence: 98%
See 2 more Smart Citations
“…As such, in our current evaluation we could not thoroughly evaluate our 'Text De-identifier' module. Due to the various complexities involved with de-identifying free text, this module may need some improvement when working with narrative, free-text reports; perhaps reusing some of the techniques discussed by other researchers [15,18] may help achieve a high degree of accuracy. A third limitation is that for zip codes, we retained only the first three characters and replaced the rest with zeros; however, for zip code de-identification to be fully compliant with HIPAA regulations, it needs to meet additional criteria where according to the current publicly available data from the Bureau of the Census, the geographic unit formed by combining all zip codes with the same three initial digits should contain more than 20,000 people.…”
Section: Discussionmentioning
confidence: 98%
“…al. [15] explored various data de-identification techniques for clinical documents within the VA, while Gupta et. al.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The ability to specify the optimization objective in terms of different F-scores is an advantage, as it gives some amount of control over the precision-recall trade-off. In the case of deidentification, for instance, recall is arguably of more importance than precision, as argued in [35], where F 2 -score was used to evaluate various deidentification systems.…”
Section: Discussionmentioning
confidence: 99%
“…Methods are usually based on pattern matching and dictionaries, or on machine learning algorithms. Some are more generalizable than others, and certain methods perform better with some types of PHI than others [71,72]. Recent examples such as MIST [73], BoB [74], Anonym [75], and several systems developed for the i2b2 NLP challenges [76,77], allow for good accuracy and very limited impact on clinical information.…”
Section: Impact On Reusementioning
confidence: 99%