Proceedings of the 28th International Conference on Computational Linguistics 2020
DOI: 10.18653/v1/2020.coling-main.570
|View full text |Cite
|
Sign up to set email alerts
|

Read and Reason with MuSeRC and RuCoS: Datasets for Machine Reading Comprehension for Russian

Abstract: The paper introduces two Russian machine reading comprehension (MRC) datasets, called MuSeRC and RuCoS, which require reasoning over multiple sentences and commonsense knowledge to infer the answer. The former follows the design of MultiRC, while the latter is a counterpart of the ReCoRD dataset. The datasets are included in RussianSuperGLUE, the Russian general language understanding benchmark. We provide a comparative analysis and demonstrate that the proposed tasks are relatively more complex as compared to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 20 publications
0
6
0
Order By: Relevance
“…It is the first standardized set of diverse NLU benchmarks for Russian. Some of the instances for its datasets were translated from the corresponding tasks in the SuperGLUE, while the others were collected by the RSG authors from scratch [10].…”
Section: Previous Workmentioning
confidence: 99%
See 2 more Smart Citations
“…It is the first standardized set of diverse NLU benchmarks for Russian. Some of the instances for its datasets were translated from the corresponding tasks in the SuperGLUE, while the others were collected by the RSG authors from scratch [10].…”
Section: Previous Workmentioning
confidence: 99%
“…The MuSeRC dataset is collected for the reading comprehension task. It contains more than 900 paragraphs across 5 different domains: elementary school texts, news, fiction stories, fairy tales, and summaries of TV series and books [10]. Samples were collected based on the following criteria:…”
Section: Russian Multi-sentence Reading Comprehension (Muserc)mentioning
confidence: 99%
See 1 more Smart Citation
“…The new version of RuCoS involves the following updates. We doubled the size of the validation (7527 examples) and test (7257 examples) sets as described in [10]. We manually verified the crowd-worker annotations and corrected typos and annotation inconsistencies.…”
Section: Rucosmentioning
confidence: 99%
“…The performance leaderboard is developed as well (see Figure 1). Besides, Russian SuperGLUE 1.1 involves minor bug fixes along with the support of the novel models for Russian: RuGPT3 models 9 included in the list of models by HuggingFace library 10 .…”
Section: Infrastructure Advancesmentioning
confidence: 99%