RuCCoN: Clinical Concept Normalization in Russian

Nesterov, Alexandre; Zubkova, Galina; Miftahutdinov, Zulfat; Kokh, Vladimir; Tutubalina, Elena; Shelmanov, Artem; Alekseev, Anton; Avetisian, Manvel; Chertok, Andrey; Nikolenko, Sergey I.

doi:10.18653/v1/2022.findings-acl.21

Cited by 4 publications

(1 citation statement)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In particular, the Russian part of the Unified Medical Language System (UMLS) ( Bodenreider 2004 ) includes three source vocabularies, and it still only amounts to 1.8% of the English UMLS in vocabulary and 1.36% in source counts ( NIH UMLS 2022 ). Currently, there are several annotated corpora for the extraction of diseases, drugs, and adverse drug reactions from social media and clinical records in Russian ( Tutubalina et al 2021 ; Nesterov et al 2022 ). A recent work on a Russian medical language understanding benchmark ( Blinov et al 2022 ) includes the RuDReC corpus ( Tutubalina et al 2021 ) for named entity recognition (NER).…”

Section: Introductionmentioning

confidence: 99%

NEREL-BIO: a dataset of biomedical abstracts annotated with nested named entities

et al. 2023

View full text Add to dashboard Cite

Motivation This paper describes NEREL-BIO – an annotation scheme and corpus of PubMed abstracts in Russian and smaller number of abstracts in English. NEREL-BIO extends the general domain dataset NEREL (Loukachevitch et al., 2021) by introducing domain-specific entity types. NEREL-BIO annotation scheme covers both general and biomedical domains making it suitable for domain transfer experiments. NEREL-BIO provides annotation for nested named entities as an extension of the scheme employed for NEREL. Nested named entities may cross entity boundaries to connect to shorter entities nested within longer entities, making them harder to detect. Results NEREL-BIO contains annotations for 700+ Russian and 100+ English abstracts. All English PubMed annotations have corresponding Russian counterparts. Thus, NEREL-BIO comprises the following specific features: annotation of nested named entities, it can be used as a benchmark for cross-domain (NEREL → NEREL-BIO) and cross-language (English → Russian) transfer. We experiment with both transformer-based sequence models and machine reading comprehension (MRC) models and report their results. Availability The dataset and annotation guidelines are freely available at https://github.com/nerel-ds/NEREL-BIO.

show abstract