2019
DOI: 10.1186/s12911-019-0935-4
|View full text |Cite
|
Sign up to set email alerts
|

A study of deep learning methods for de-identification of clinical notes in cross-institute settings

Abstract: BackgroundDe-identification is a critical technology to facilitate the use of unstructured clinical text while protecting patient privacy and confidentiality. The clinical natural language processing (NLP) community has invested great efforts in developing methods and corpora for de-identification of clinical notes. These annotated corpora are valuable resources for developing automated systems to de-identify clinical text at local hospitals. However, existing studies often utilized training and test data coll… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
32
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 55 publications
(40 citation statements)
references
References 25 publications
2
32
0
Order By: Relevance
“…The model utilizes a CRFs layer to decode the LSTM hidden states to BIO tags. We screened 4 different word embeddings following a similar procedure reported in our previous study [ 46 ] and found that the Common Crawl embeddings—released by Facebook and trained using the fastText on the Common Crawl data set [ 47 ]—achieved better performance compared to other embeddings on a validation data set. Thus, we used the Common Crawl embeddings for all LSTM-CRFs models.…”
Section: Methodsmentioning
confidence: 99%
“…The model utilizes a CRFs layer to decode the LSTM hidden states to BIO tags. We screened 4 different word embeddings following a similar procedure reported in our previous study [ 46 ] and found that the Common Crawl embeddings—released by Facebook and trained using the fastText on the Common Crawl data set [ 47 ]—achieved better performance compared to other embeddings on a validation data set. Thus, we used the Common Crawl embeddings for all LSTM-CRFs models.…”
Section: Methodsmentioning
confidence: 99%
“…We compared two training strategies, including the fine-tuning and the training-from-scratch. For the fine-tuning approach, the deep learning model was first pre-trained using a de-identification dataset curated in the 2014 i2b2 challenge [25] as a base checkpoint. Then, we continuously fine-tuned this checkpoint (i.e., initialize new models with the weights from this checkpoint and use the same model settings) using the local UF datasets (i.e., different number of notes) developed in this study.…”
Section: Models and Training Strategiesmentioning
confidence: 99%
“…We adopted the LSTM-CRFs model developed in our previous works [25,28] using TensorFlow [29]. We trained models using the short-training sets and selected the optimized model checkpoints according to the performances on the validation sets.…”
Section: Experiments and Evaluationmentioning
confidence: 99%
See 1 more Smart Citation
“…De-identification of clinical notes is one of the most crucial prerequisites for utilizing clinical notes in other downstream biomedical informatics studies. Yang et al [5] explored de-identification in cross-institute settings using deep learning-based approaches: fine-tuning and pre-training. They pre-trained de-identification models, LSTM-CRF, on the University of Florida (UF) Health corpus and fine-tuned the models on i2b2 datasets.…”
Section: Topicsmentioning
confidence: 99%