Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus

Nguyen, Vinh; Yip, Hong Yung; Bodenreider, Olivier

doi:10.1145/3442381.3450128

Cited by 17 publications

(41 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Motivation. Clustering biomedical terms into concepts in the UMLS Metathesaurus was formalized into a vocabulary alignment problem identified as UMLS Vocabulary Alignment (UVA) or synonymy prediction task by (Nguyen et al, 2021). The UVA is different from other biomedical ontology alignment efforts by the Ontology Alignment Evaluation Initiative (OAEI) due to the extremely large problem size of the UVA with the need to compare 8.7M biomedical terms pairwise (as opposed to tens of thousands of pairs in OAEI datasets).…”

Section: Introductionmentioning

confidence: 99%

“…The UVA is different from other biomedical ontology alignment efforts by the Ontology Alignment Evaluation Initiative (OAEI) due to the extremely large problem size of the UVA with the need to compare 8.7M biomedical terms pairwise (as opposed to tens of thousands of pairs in OAEI datasets). The authors of (Nguyen et al, 2021) also introduced a scalable supervised learning approach based on the Siamese neural architecture which leverages the lexical information present in the terms. Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al, 2019) is a language model (LM), based on the multi-layer, bidirectional architecture of Transformers (Vaswani et al, 2017).…”

Section: Introductionmentioning

confidence: 99%

“…We identify BERT-based models (in this work BERT-based models refer to BioBERT, BLUEBERT, SapBERT and UmlsBERT) and use them as baselines without further pretraining or fine-tuning on the UVA task. Another baseline used in our work is the LexLM provided by (Nguyen et al, 2021). Then we design experiments to pretrain UBERT from scratch (without using any trained weights from other biomedical or clinical BERT-based models) resulting in three variants of UBERT.…”

Section: Introductionmentioning

confidence: 99%

“…Nguyen et al (Nguyen et al, 2021) have elaborated the background knowledge required to understand the UVA task. In this section we will briefly summarize it.…”

Section: Introductionmentioning

confidence: 99%

“…The UMLS Metathesaurus contains approximately ten million English atom strings, each of which being linked to a concept. Since the authors of (Nguyen et al, 2021) focus on assessing whether two atoms are synonymous and should be associated with the same concept, the problem is formulated as a similarity task. We maintain this same problem definition from (Nguyen et al, 2021).…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

UBERT: A Novel Language Model for Synonymy Prediction at Scale in the UMLS Metathesaurus

Wijesiriwardene¹,

Nguyen²,

Bajaj³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

The UMLS Metathesaurus integrates more than 200 biomedical source vocabularies. During the Metathesaurus construction process, synonymous terms are clustered into concepts by human editors, assisted by lexical similarity algorithms. This process is error-prone and time-consuming. Recently, a deep learning model (LexLM) has been developed for the UMLS Vocabulary Alignment (UVA) task. This work introduces UBERT, a BERT-based language model, pretrained on UMLS terms via a supervised Synonymy Prediction (SP) task replacing the original Next Sentence Prediction (NSP) task. The effectiveness of UBERT for UMLS Metathesaurus construction process is evaluated using the UMLS Vocabulary Alignment (UVA) task. We show that UBERT outperforms the LexLM, as well as biomedical BERT-based models. Key to the performance of UBERT are the synonymy prediction task specifically developed for UBERT, the tight alignment of training data to the UVA task, and the similarity of the models used for pretrained UBERT.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

“…Nguyen et al (Nguyen et al, 2021) have elaborated the background knowledge required to understand the UVA task. In this section we will briefly summarize it.…”

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

UBERT: A Novel Language Model for Synonymy Prediction at Scale in the UMLS Metathesaurus

Wijesiriwardene¹,

Nguyen²,

Bajaj³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

Towards a Similarity Algorithm for Controlled Vocabularies Within the Digital Humanities

Ernst

2022

The Semantic Web: ESWC 2022 Satellite Events

View full text Add to dashboard Cite

Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching

Chen

Dong

et al. 2022

The Semantic Web – ISWC 2022

View full text Add to dashboard Cite

Ontology Matching (OM) plays an important role in many domains such as bioinformatics and the Semantic Web, and its research is becoming increasingly popular, especially with the application of machine learning (ML) techniques. Although the Ontology Alignment Evaluation Initiative (OAEI) represents an impressive effort for the systematic evaluation of OM systems, it still suffers from several limitations including limited evaluation of subsumption mappings, suboptimal reference mappings, and limited support for the evaluation of ML-based systems. To tackle these limitations, we introduce five new biomedical OM tasks involving ontologies extracted from Mondo and UMLS. Each task includes both equivalence and subsumption matching; the quality of reference mappings is ensured by human curation, ontology pruning, etc.; and a comprehensive evaluation framework is proposed to measure OM performance from various perspectives for both ML-based and non-ML-based OM systems. We report evaluation results for OM systems of different types to demonstrate the usage of these resources, all of which are publicly available as part of the new Bio-ML track at OAEI 2022.

show abstract

Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus

Cited by 17 publications

References 32 publications

UBERT: A Novel Language Model for Synonymy Prediction at Scale in the UMLS Metathesaurus

UBERT: A Novel Language Model for Synonymy Prediction at Scale in the UMLS Metathesaurus

Towards a Similarity Algorithm for Controlled Vocabularies Within the Digital Humanities

Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching

Contact Info

Product

Resources

About