Read and Reason with MuSeRC and RuCoS: Datasets for Machine Reading Comprehension for Russian

Fenogenova, Alena; Mikhailov, Vladislav; Shevelev, Denis

doi:10.18653/v1/2020.coling-main.570

Cited by 7 publications

(6 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is the first standardized set of diverse NLU benchmarks for Russian. Some of the instances for its datasets were translated from the corresponding tasks in the SuperGLUE, while the others were collected by the RSG authors from scratch [10].…”

Section: Previous Workmentioning

confidence: 99%

“…The MuSeRC dataset is collected for the reading comprehension task. It contains more than 900 paragraphs across 5 different domains: elementary school texts, news, fiction stories, fairy tales, and summaries of TV series and books [10]. Samples were collected based on the following criteria:…”

Section: Russian Multi-sentence Reading Comprehension (Muserc)mentioning

confidence: 99%

“…Exact Match (EM) is the exact match per each instance, i.e. each set of predictions should be the same as of the answers [10].…”

Section: Heuristicmentioning

confidence: 99%

See 2 more Smart Citations

Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks

Tatyana

Bystrova

Kapelyushnik

et al. 2021

Computational Linguistics and Intellectual Technologies

View full text Add to dashboard Cite

Leaderboards like SuperGLUE are seen as important incentives for active development of NLP, since they provide standard benchmarks for fair comparison of modern language models. They have driven the world's best engineering teams as well as their resources to collaborate and solve a set of tasks for general language understanding. Their performance scores are often claimed to be close to or even higher than the human performance. These results encouraged more thorough analysis of whether the benchmark datasets featured any statistical cues that machine learning based language models can exploit. For English datasets, it was shown that they often contain annotation artifacts. This allows solving certain tasks with very simple rules and achieving competitive rankings.In this paper, a similar analysis was done for the Russian SuperGLUE (RSG), a recently published benchmark set and leaderboard for Russian natural language understanding. We show that its test datasets are vulnerable to shallow heuristics. Often approaches based on simple rules outperform or come close to the results of the notorious pre-trained language models like GPT-3 or BERT. It is likely (as the simplest explanation) that a significant part of the SOTA models performance in the RSG leaderboard is due to exploiting these shallow heuristics and that has nothing in common with real language understanding. We provide a set of recommendations on how to improve these datasets, making the RSG leaderboard even more representative of the real progress in Russian NLU.

show abstract

Section: Previous Workmentioning

confidence: 99%

Section: Russian Multi-sentence Reading Comprehension (Muserc)mentioning

confidence: 99%

See 1 more Smart Citation

Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks

Tatyana

Bystrova

Kapelyushnik

et al. 2021

Computational Linguistics and Intellectual Technologies

View full text Add to dashboard Cite

show abstract

“…The new version of RuCoS involves the following updates. We doubled the size of the validation (7527 examples) and test (7257 examples) sets as described in [10]. We manually verified the crowd-worker annotations and corrected typos and annotation inconsistencies.…”

Section: Rucosmentioning

confidence: 99%

“…The performance leaderboard is developed as well (see Figure 1). Besides, Russian SuperGLUE 1.1 involves minor bug fixes along with the support of the novel models for Russian: RuGPT3 models 9 included in the list of models by HuggingFace library 10 .…”

Section: Infrastructure Advancesmentioning

confidence: 99%

Russian SuperGLUE 1.1: Revising the Lessons not Learned by Russian NLP-models

Fenogenova¹,

Tikhonova²,

Mikhailov³

et al. 2021

Computational Linguistics and Intellectual Technologies

Self Cite

View full text Add to dashboard Cite

In the last year, new neural architectures and multilingual pre-trained models have been released for Russian, which led to performance evaluation problems across a range of language understanding tasks.This paper presents Russian SuperGLUE 1.1, an updated benchmark styled after GLUE for Russian NLP models. The new version includes a number of technical, user experience and methodological improvements, including fixes of the benchmark vulnerabilities unresolved in the previous version: novel and improved tests for understanding the meaning of a word in context (RUSSE) along with reading comprehension and common sense reasoning (DaNetQA, RuCoS, MuSeRC). Together with the release of the updated datasets, we improve the benchmark toolkit based on jiant framework for consistent training and evaluation of NLP-models of various architectures which now supports the most recent models for Russian. Finally, we provide the integration of Russian SuperGLUE with a framework for industrial evaluation of the open-source models, MOROCCO (MOdel ResOurCe COmparison), in which the models are evaluated according to the weighted average metric over all tasks, the inference speed, and the occupied amount of RAM. Russian SuperGLUE is publicly available at https: //russiansuperglue.com/.

show abstract

I’ve Got the “answer”!

Goloviznina,

Kotelnikov

2024

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Read and Reason with MuSeRC and RuCoS: Datasets for Machine Reading Comprehension for Russian

Cited by 7 publications

References 20 publications

Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks

Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks

Russian SuperGLUE 1.1: Revising the Lessons not Learned by Russian NLP-models

I’ve Got the “answer”!

Contact Info

Product

Resources

About