The 2019 n2c2/UMass Lowell shared task on clinical concept normalization

Luo, Yen-Fu; Henry, Sam; Wang, Yanshan; Shen, Feichen; Uzuner, Özlem; Rumshisky, Anna

doi:10.1093/jamia/ocaa106

Cited by 31 publications

(17 citation statements)

References 58 publications

(65 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are a few similar works to our vector space model, CNN-triplet (Mondal et al, 2019), BIOSYN (Sung et al, 2020), RoBERTa-Node2Vec (Pattisapu et al, 2020), and TTI (Henry et al, 2020). CNN-triplet is a two-step approach, requiring a generator to generate candidates for training the triplet network, and requiring various embedding resources as input to CNN-based encoder.…”

Section: Related Workmentioning

confidence: 99%

“…Research on concept normalization has grown thanks to shared tasks such as disorder normalization in the 2013 ShARe/CLEF (Suominen et al, 2013), chemical and disease normalization in BioCreative V Chemical Disease Relation (CDR) Task , and medical concept normalization in 2019 n2c2 shared task (Henry et al, 2020), and to the availability of annotated data (Dogan et al, 2014;Luo et al, 2019). Existing approaches can be divided into three categories: rule-based approaches using string-matching or dictionary look-up (Leal et al, 2015;D'Souza and Ng, 2015;Lee et al, 2016), which rely heavily on handcrafted rules and domain knowledge; supervised multi-class classifiers (Limsopatham and Collier, 2016;Lee et al, 2017;Tutubalina et al, 2018;Niu et al, 2019;Li et al, 2019), which cannot generalize to concept types not present in their training data; and two-step frameworks based on a nontrained candidate generator and a supervised candidate ranker (Leaman et al, 2013;Li et al, 2017;Liu and Xu, 2017;Nguyen et al, 2018;Murty et al, 2018;Mondal et al, 2019;Ji et al, 2020;Xu et al, 2020), which require complex pipelines and fail if the candidate generator does not find the gold truth concept.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Triplet-Trained Vector Space and Sieve-Based Search Improve Biomedical Concept Normalization

Xu¹,

Bethard²

2021

Proceedings of the 20th Workshop on Biomedical Language Processing

View full text Add to dashboard Cite

Concept normalization, the task of linking textual mentions of concepts to concepts in an ontology, is critical for mining and analyzing biomedical texts. We propose a vector-space model for concept normalization, where mentions and concepts are encoded via transformer networks that are trained via a triplet objective with online hard triplet mining. The transformer networks refine existing pre-trained models, and the online triplet mining makes training efficient even with hundreds of thousands of concepts by sampling training triples within each mini-batch. We introduce a variety of strategies for searching with the trained vector-space model, including approaches that incorporate domain-specific synonyms at search time with no model retraining. Across five datasets, our models that are trained only once on their corresponding ontologies are within 3 points of state-of-the-art models that are retrained for each new domain. Our models can also be trained for each domain, achieving new state-of-the-art on multiple datasets.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Triplet-Trained Vector Space and Sieve-Based Search Improve Biomedical Concept Normalization

Xu¹,

Bethard²

2021

Proceedings of the 20th Workshop on Biomedical Language Processing

View full text Add to dashboard Cite

show abstract

“…Recently, a new corpus, called MCN [ 3 ], was created exclusively for the clinical term normalization task, which also includes clinical terms of other semantic types. This corpus was provided as the data set for 2019 n2c2 Track 3 [ 17 ], a shared task for clinical term normalization. In this paper, we describe our system that we had submitted for this shared task.…”

Section: Introductionmentioning

confidence: 99%

“…Our system, UWM, achieved an accuracy of 80.79% on the test data set of the MCN corpus, which ranked sixth among the 33 system submissions and was behind by only 1.15% (absolute) to the second ranked system (81.94%) and was well above the mean (74.26%) and the median (77.33%) of all the participating systems [ 17 ]. The top system scored 85.26% and used a massive end-to-end deep learning architecture.…”

Section: Introductionmentioning

confidence: 99%

Clinical Term Normalization Using Learned Edit Patterns and Subconcept Matching: System Development and Evaluation

Kate¹

2021

JMIR Med Inform

View full text Add to dashboard Cite

Background Clinical terms mentioned in clinical text are often not in their standardized forms as listed in clinical terminologies because of linguistic and stylistic variations. However, many automated downstream applications require clinical terms mapped to their corresponding concepts in clinical terminologies, thus necessitating the task of clinical term normalization. Objective In this paper, a system for clinical term normalization is presented that utilizes edit patterns to convert clinical terms into their normalized forms. Methods The edit patterns are automatically learned from the Unified Medical Language System (UMLS) Metathesaurus as well as from the given training data. The edit patterns are generalized sequences of edits that are derived from edit distance computations. The edit patterns are both character based as well as word based and are learned separately for different semantic types. In addition to these edit patterns, the system also normalizes clinical terms through the subconcepts mentioned within them. Results The system was evaluated as part of the 2019 n2c2 Track 3 shared task of clinical term normalization. It obtained 80.79% accuracy on the standard test data. This paper includes ablation studies to evaluate the contributions of different components of the system. A challenging part of the task was disambiguation when a clinical term could be normalized to multiple concepts. Conclusions The learned edit patterns led the system to perform well on the normalization task. Given that the system is based on patterns, it is human interpretable and is also capable of giving insights about common variations of clinical terms mentioned in clinical text that are different from their standardized forms.

show abstract

“…Recently, a new corpus, called MCN [3], was created exclusively for the clinical term normalization task, which also includes clinical terms of other semantic types. This corpus was provided as the data set for 2019 n2c2 Track 3 [17], a shared task for clinical term normalization. In this paper, we describe our system that we had submitted for this shared task.…”

Section: Introductionmentioning

confidence: 99%

Clinical Term Normalization Using Learned Edit Patterns and Subconcept Matching: System Development and Evaluation (Preprint)

Kate¹

2020

Preprint

View full text Add to dashboard Cite

UNSTRUCTURED Clinical terms mentioned in clinical text are often not in their standardized forms as listed in clinical terminologies due to linguistic and stylistic variations thus necessitating the task of normalization. In this paper, a system for clinical term normalization is presented which utilizes patterns to convert clinical terms into their normalized forms. These patterns are automatically learned from UMLS as well as from a given training corpus. The patterns are generalized sequences of edits which are derived from edit distance computation. The patterns are both character-based as well as word-based and are learned separately for different semantic types. Besides these patterns, the system also normalizes clinical terms through the subterms mentioned in them. The system was evaluated on the MCN corpus as part of the 2019 n2c2 Track 3 shared task of clinical term normalization. It obtained 80.79% accuracy on the standard test data. The paper includes an ablation study to evaluate contributions of various components of the system. A challenging part of the task, which accounted for a loss of 5% in absolute accuracy, was disambiguation task when a clinical term could be normalized to multiple concepts. Given that the system is based on patterns, it is human-interpretable and also capable of giving insights into the common forms in which clinical terms could be found in clinical text which are different from their standardized forms.

show abstract

The 2019 n2c2/UMass Lowell shared task on clinical concept normalization

Cited by 31 publications

References 58 publications

Triplet-Trained Vector Space and Sieve-Based Search Improve Biomedical Concept Normalization

Triplet-Trained Vector Space and Sieve-Based Search Improve Biomedical Concept Normalization

Clinical Term Normalization Using Learned Edit Patterns and Subconcept Matching: System Development and Evaluation

Clinical Term Normalization Using Learned Edit Patterns and Subconcept Matching: System Development and Evaluation (Preprint)

Contact Info

Product

Resources

About