Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
2015
DOI: 10.1016/j.jbi.2015.08.012
|View full text |Cite
|
Sign up to set email alerts
|

CRFs based de-identification of medical records

Abstract: De-identification is a shared task of the 2014 i2b2/UTHealth challenge. The purpose of this task is to remove protected health information (PHI) from medical records. In this paper, we propose a novel de-identifier, WI-deId, based on conditional random fields (CRFs). A preprocessing module, which tokenizes the medical records using regular expressions and an off-the-shelf tokenizer, is introduced, and three groups of features are extracted to train the de-identifier model. The experiment shows that our system … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
33
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
6
4

Relationship

1
9

Authors

Journals

citations
Cited by 40 publications
(35 citation statements)
references
References 13 publications
0
33
0
Order By: Relevance
“…Over the years, the effective but highly complex rulesbased methods [20] have given way to machine learning systems where large amounts of data with easy-to-extract features were available for the training phase, [21]- [23] or to hybrid systems capable of detecting entities even in cases of scarcity of data and complex features, provided that they are more sophisticated and take more time to be implemented [24]- [27]. These machine learning (ML) algorithms have modeled the NER problem, i.e.…”
Section: A Rules and Machine Learning Approachesmentioning
confidence: 99%
“…Over the years, the effective but highly complex rulesbased methods [20] have given way to machine learning systems where large amounts of data with easy-to-extract features were available for the training phase, [21]- [23] or to hybrid systems capable of detecting entities even in cases of scarcity of data and complex features, provided that they are more sophisticated and take more time to be implemented [24]- [27]. These machine learning (ML) algorithms have modeled the NER problem, i.e.…”
Section: A Rules and Machine Learning Approachesmentioning
confidence: 99%
“…Lots of teams from all around the world participated in this three challenges. In the two i2b2 NLP challenges, the proposed de-identification systems may fall in three categories: rule-based [26], machine learning-based [10, 14, 27], and hybrid [11, 12, 13, 15]. The rule-based systems can exactly recognize formulaic PHI instances (i.e., phone numbers, emails, licenses, etc.…”
Section: Introductionmentioning
confidence: 99%
“…One evidence is that the performance of the participating systems in the 2016 track was poorer than that in the 2014 track(maximum = 0.936, median = 0.845). Another is that we re-trained our participating system in 2014 track ( F 1 = 0.924) [32] directly on the 2016 training set, and achieved a much lower F 1 score of 0.823 on the test set.…”
Section: Discussionmentioning
confidence: 99%