Findings of the Association for Computational Linguistics: ACL 2022 2022
DOI: 10.18653/v1/2022.findings-acl.21
|View full text |Cite
|
Sign up to set email alerts
|

RuCCoN: Clinical Concept Normalization in Russian

Abstract: We present RuCCoN, a new dataset for clinical concept normalization in Russian manually annotated by medical professionals. It contains over 16,028 entity mentions manually linked to over 2,409 unique concepts from the Russian language part of the UMLS ontology. We provide train/test splits for different settings (stratified, zero-shot, and CUIless) and present strong baselines obtained with state-of-the-art models such as SapBERT. At present, Russian medical NLP is lacking in both datasets and trained models,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 6 publications
0
1
0
Order By: Relevance
“…In particular, the Russian part of the Unified Medical Language System (UMLS) ( Bodenreider 2004 ) includes three source vocabularies, and it still only amounts to 1.8% of the English UMLS in vocabulary and 1.36% in source counts ( NIH UMLS 2022 ). Currently, there are several annotated corpora for the extraction of diseases, drugs, and adverse drug reactions from social media and clinical records in Russian ( Tutubalina et al 2021 ; Nesterov et al 2022 ). A recent work on a Russian medical language understanding benchmark ( Blinov et al 2022 ) includes the RuDReC corpus ( Tutubalina et al 2021 ) for named entity recognition (NER).…”
Section: Introductionmentioning
confidence: 99%
“…In particular, the Russian part of the Unified Medical Language System (UMLS) ( Bodenreider 2004 ) includes three source vocabularies, and it still only amounts to 1.8% of the English UMLS in vocabulary and 1.36% in source counts ( NIH UMLS 2022 ). Currently, there are several annotated corpora for the extraction of diseases, drugs, and adverse drug reactions from social media and clinical records in Russian ( Tutubalina et al 2021 ; Nesterov et al 2022 ). A recent work on a Russian medical language understanding benchmark ( Blinov et al 2022 ) includes the RuDReC corpus ( Tutubalina et al 2021 ) for named entity recognition (NER).…”
Section: Introductionmentioning
confidence: 99%