2020
DOI: 10.1007/978-3-030-58219-7_1
|View full text |Cite
|
Sign up to set email alerts
|

SberQuAD – Russian Reading Comprehension Dataset: Description and Analysis

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
27
0
1

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 45 publications
(31 citation statements)
references
References 18 publications
0
27
0
1
Order By: Relevance
“…Moreover, additional training on English science QA in lower school levels has no significant effect on the overall accuracy. These results suggest that further investigation of finetuning with other multilingual datasets (Gupta et al, 2018;Lewis et al, 2020;Efimov et al, 2020;d'Hoffschmidt et al, 2020;Artetxe et al, 2020;Longpre et al, 2020) is needed in order to understand the domain transfer benefits to science QA in Eχαµs, even if they are not in a multi-choice setting (Khashabi et al, 2020). Using domain-adaptive and task-adaptive pre-training (Gururangan et al, 2020) to the multilingual science QA might offer further potential benefits.…”
Section: Discussionmentioning
confidence: 86%
See 1 more Smart Citation
“…Moreover, additional training on English science QA in lower school levels has no significant effect on the overall accuracy. These results suggest that further investigation of finetuning with other multilingual datasets (Gupta et al, 2018;Lewis et al, 2020;Efimov et al, 2020;d'Hoffschmidt et al, 2020;Artetxe et al, 2020;Longpre et al, 2020) is needed in order to understand the domain transfer benefits to science QA in Eχαµs, even if they are not in a multi-choice setting (Khashabi et al, 2020). Using domain-adaptive and task-adaptive pre-training (Gururangan et al, 2020) to the multilingual science QA might offer further potential benefits.…”
Section: Discussionmentioning
confidence: 86%
“…Other efforts focused on building bi-lingual datasets that are similar in spirit to SQuAD (Rajpurkar et al, 2016) -extractive reading comprehension over open-domain articles. Such datasets are collected by crowdsourcing questions, following a procedure similar to (Rajpurkar et al, 2016), in Russian (Efimov et al, 2020), Korean (Lim et al, 2019), French (d'Hoffschmidt et al, 2020, or by translating existing English QA pairs to Spanish (Carrino et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…Among them, some initiatives have been carried out in Chinese, Korean and Russian and all of them have been built in a similar way to SQuAD1.1. The SberQuAD dataset (Efimov et al, 2019) is a Russian native Reading Comprehension dataset and is made up of 50,000+ samples. The CMRC 2018 (Cui et al, 2019) dataset is a Chinese native Reading Comprehension dataset that gathers 20,000+ question and answer pairs.…”
Section: Reading Comprehension In Other Languagesmentioning
confidence: 99%
“…Translated datasets have also been used in making cross lingual benchmark datasets like XQuAD (Artetxe et al, 2019) and MLQA (Lewis et al, 2019). Aside from using translated dataset, there has also been attempts of curating large question answering datasets in multiple other languages including French (d'Hoffschmidt et al, 2020), Korean (Lim et al, 2019), Russian (Efimov et al, 2020), Chinese (Cui et al, 2018;Shao et al, 2018) and benchmark models like QANet (Yu et al, 2018), BiDAF (Seo et al, 2016), BERT (Devlin et al, 2018) have been trained on them. In contrast to gathering translated or human annotated dataset for model training, zero shot transfer learning where pretrained models were evaluated directly on a new language after task specific training on question answering has also been attempted on reading comprehension tasks (Artetxe et al, 2019;Hsu et al, 2019;Siblini et al, 2019).…”
Section: Question Answering In Englishmentioning
confidence: 99%