RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark

Shavrina, Tatiana; Fenogenova, Alena; Anton, Emil; Shevelev, D.V.; Artemova, Ekaterina; Malykh, Valentin; Mikhailov, Vladislav; Tikhonova, Maria A.; Chertok, Andrey; Evlampiev, Andrey

doi:10.48550/arxiv.2010.15925

Cited by 5 publications

(8 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We continue our work on Russian SuperGLUE 5 [6] which follows the general language understanding evaluation methodology. Similarly to the English prototype, Russian benchmark includes a set of NLU tasks and a publicly available leaderboard.…”

Section: Russian Superglue Tasksmentioning

confidence: 99%

“…For the obtained test set we re-scored the human benchmark using the same annotation procedure in Yandex.Toloka task as described in [6] but on the new subset of the data. The human performance achieved 80.5% accuracy, while the best model performance on the leaderboard 2 at present is 72.9% (RuBERT conversational).…”

Section: Russementioning

confidence: 99%

“…The central benchmarks in the field are GLUE [1] and SuperGLUE [2] projects for English, they include versatile tasks and allow competitive evaluation of the models on a public leaderboard. Recently, analogous general language understanding evaluation benchmarks have been developed for Chinese [3], French [4], Polish [5] and Russian [6]. RussianSuperGLUE provides nine novel Russian NLU tasks, a public leaderboard, count-based and transformer-based baselines, and human solver evaluation.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Russian SuperGLUE 1.1: Revising the Lessons not Learned by Russian NLP models

Fenogenova¹,

Tikhonova²,

Mikhailov³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

In the last year, new neural architectures and multilingual pre-trained models have been released for Russian, which led to performance evaluation problems across a range of language understanding tasks.This paper presents Russian SuperGLUE 1.1, an updated benchmark styled after GLUE for Russian NLP models. The new version includes a number of technical, user experience and methodological improvements, including fixes of the benchmark vulnerabilities unresolved in the previous version: novel and improved tests for understanding the meaning of a word in context (RUSSE) along with reading comprehension and common sense reasoning (DaNetQA, RuCoS, MuSeRC). Together with the release of the updated datasets, we improve the benchmark toolkit based on jiant framework for consistent training and evaluation of NLP-models of various architectures which now supports the most recent models for Russian. Finally, we provide the integration of Russian SuperGLUE with a framework for industrial evaluation of the open-source models, MOROCCO (MOdel ResOurCe COmparison), in which the models are evaluated according to the weighted average metric over all tasks, the inference speed, and the occupied amount of RAM. Russian SuperGLUE is publicly available at https: //russiansuperglue.com/.

show abstract

Section: Russian Superglue Tasksmentioning

confidence: 99%

Section: Russementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Russian SuperGLUE 1.1: Revising the Lessons not Learned by Russian NLP models

Fenogenova¹,

Tikhonova²,

Mikhailov³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…In addition, a multilingual version of USE [11] embeddings is used. The mentioned models show state-of-the-art results on a number of NLP benchmarks [13], including those in Russian language [8], so it was natural to test them on the task of selecting the best headline for the cluster.…”

Section: Embeddingsmentioning

confidence: 99%

“…For example, news aggregators actively use clustering algorithms to generate news feeds from different sources and to select a single headline. The recent progress in designing multilingual models [13], trained for dozens or even hundreds of languages at once, makes it possible to use them for monolingual tasks, particularly for Russian language tasks [8]. At the same time, Russian BERT-based models are actively evolving, and their comparison with more universal multilingual ones may be of interest.…”

Section: Introductionmentioning

confidence: 99%

Transformers for Headline Selection for Russian News Clusters

Pavel¹,

Sopilnyak²

2021

Preprint

View full text Add to dashboard Cite

In this paper, we explore various multilingual and Russian pre-trained transformer-based models for the Dialogue Evaluation 2021 shared task on headline selection. Our experiments show that the combined approach is superior to individual multilingual and monolingual models. We present an analysis of a number of ways to obtain sentence embeddings and learn a ranking model on top of them. We achieve the result of 87.28% and 86.60% accuracy for the public and private test sets respectively.

show abstract

Winograd schemata and other datasets for anaphora resolution in Hungarian

Vadász

Ligeti-Nagy

2022

ALing

View full text Add to dashboard Cite

The Winograd Schema Challenge (WSC, proposed by Levesque, Davis & Morgenstern 2012) is considered to be the novel Turing Test to examine machine intelligence. Winograd schema questions require the resolution of anaphora with the help of world knowledge and commonsense reasoning. Anaphora resolution is itself an important and difficult issue in natural language processing, therefore, many other datasets have been created to address this issue. In this paper we look into the Winograd schemata and other Winograd-like datasets and the translations of the schemata to other languages, such as Chinese, French and Portuguese. We present the Hungarian translation of the original Winograd schemata and a parallel corpus of all the translations of the schemata currently available. We also adapted some other anaphora resolution datasets to Hungarian. We aim to discuss the challenges we faced during the translation/adaption process.

show abstract

RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark

Cited by 5 publications

References 6 publications

Russian SuperGLUE 1.1: Revising the Lessons not Learned by Russian NLP models

Russian SuperGLUE 1.1: Revising the Lessons not Learned by Russian NLP models

Transformers for Headline Selection for Russian News Clusters

Winograd schemata and other datasets for anaphora resolution in Hungarian

Contact Info

Product

Resources

About