2017
DOI: 10.1016/j.jbi.2017.10.003
|View full text |Cite
|
Sign up to set email alerts
|

De-identification of medical records using conditional random fields and long short-term memory networks

Abstract: The CEGS N-GRID 2016 Shared Task 1 in Clinical Natural Language Processing focuses on the de-identification of psychiatric evaluation records. This paper describes two participating systems of our team, based on conditional random fields (CRFs) and long short-term memory networks (LSTMs). A pre-processing module was introduced for sentence detection and tokenization before de-identification. For CRFs, manually extracted rich features were utilized to train the model. For LSTMs, a character-level bi-directional… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 29 publications
(21 citation statements)
references
References 29 publications
(47 reference statements)
0
20
0
Order By: Relevance
“…To incorporate features from local vocabulary, we utilized a feature embedding layer to incorporate linguistic and knowledge-based features with character and word embeddings [25]. We extracted two most important linguistic features, part-of-speech and word shape, according to previous works [27, 30, 31]. Knowledge-based features are derived from local vocabulary, which is different from the word embeddings that derived from unlabeled clinical text.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…To incorporate features from local vocabulary, we utilized a feature embedding layer to incorporate linguistic and knowledge-based features with character and word embeddings [25]. We extracted two most important linguistic features, part-of-speech and word shape, according to previous works [27, 30, 31]. Knowledge-based features are derived from local vocabulary, which is different from the word embeddings that derived from unlabeled clinical text.…”
Section: Methodsmentioning
confidence: 99%
“…Our previous study [25] has proved that the knowledge-based feature embedding layer improved the performance of clinical NER by integrating knowledge features with word embeddings. Chen et al [27] and Jiang et al [30] both showed that the knowledge-based features as complimentary resources to word embeddings improved the performance of identifying PHIs.…”
Section: Methodsmentioning
confidence: 99%
“…An estimated 80% of all data in EHRs reside in clinical notes [22,23] and are a rich source of data, but their unstructured format makes them complex and difficult to de-identify. Recent methods for identification of the clinical notes have achieved above 90% in accuracy and F1 scores [24][25][26]. However, this does not constitute as fully PHI-free data and poses a barrier for health systems to share data legally.…”
Section: Discussionmentioning
confidence: 99%
“…De-identification system Machine learning S1 (Zhao, Zhang, Ma, and Li (2018)), S2 (Chen, Cullen, and Godwin (2015)) S3 (Dernoncourt, Lee, Uzuner, and Szolovits (2017)), S4 (Yadav, Ekbal, Saha, Pathak, and Bhattacharyya (2017)), S5 ), S6 ) Hybrid S7 (Yang and Garibaldi (2015)) S8 (Liu, Tang, Wang, and Chen (2017)) S9 (Lee, Dernoncourt, Uzuner, and Szolovits (2016)) S10 (Dehghan, Kovacevic, Karystianis, Keane, and Nenadic (2015)) S11 (Yang and Garibaldi (2015)) S12 (He, Guan, Cheng, Cen, and Hua (2015)) S13 (Liu, Chen, Tang, Wang, Chen, Li, Wang, Deng, and Zhu (2015)) S14 (Phuong and Chau (2016)) S15 (Bui, Wyatt, and Cimino (2017a)) S16 (Jiang, Zhao, He, Guan, and Jiang (2017)) S17 (Lee, Wu, Zhang, Xu, Xu, and Roberts (2017)) S18 (Shweta, Kumar, Ekbal, Saha, and Bhattacharyya (2016)) In this section, we outline the most significant achievement of automating end-toend de-identification system: improving accuracy. It has been argued that as far as de-identification is concerned, perfection cannot be achieved; however, 95% accuracy is considered to be the rule of thumb and universally accepted value ; ).…”
Section: Architecturementioning
confidence: 99%