2016
DOI: 10.48550/arxiv.1606.05250
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

SQuAD: 100,000+ Questions for Machine Comprehension of Text

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
814
0
1

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 664 publications
(817 citation statements)
references
References 21 publications
2
814
0
1
Order By: Relevance
“…In principle, as a dual task of QA, any QA datasets can be used for QG [50]. SQuAD [58], MS-MARCO [4] and newsQA [73] are three famous datasets used for answer-extraction QG, collected from Wikipedia, Bing search logs, and CNN news respectively. Unlike the previous three datasets, Nar-rativeQA [35] does not restrict the answers to be the span of texts in the articles, therefore, it can be used as an answer-abstraction QG dataset.…”
Section: Related Work 21 Question Generationmentioning
confidence: 99%
See 3 more Smart Citations
“…In principle, as a dual task of QA, any QA datasets can be used for QG [50]. SQuAD [58], MS-MARCO [4] and newsQA [73] are three famous datasets used for answer-extraction QG, collected from Wikipedia, Bing search logs, and CNN news respectively. Unlike the previous three datasets, Nar-rativeQA [35] does not restrict the answers to be the span of texts in the articles, therefore, it can be used as an answer-abstraction QG dataset.…”
Section: Related Work 21 Question Generationmentioning
confidence: 99%
“…Chan et al build a recurrent BERT to output one question word at a recurrent step [11,12], but it is time-consuming. The generative pretrained models such as UNILM [18], T5 [57], PEGASUS [82], and UNILMV2 [5] report the model's QG scores finetuned on SQuAD [58] dataset, but they do not explore the idea of building a unified QG.…”
Section: Related Work 21 Question Generationmentioning
confidence: 99%
See 2 more Smart Citations
“…We conduct experiments on the General Language Understanding Evaluation (GLUE) benchmark (Wang et al, 2018). We compare our method with the baseline methods on two single-sentence classification tasks (CoLA (Warstadt et al, 2018), SST-2 (Socher et al, 2013)), two similarity and paraphrase tasks (MRPC (Dolan & Brockett, 2005), QQP (Chen et al, 2018)), and three inference tasks (MNLI (Williams et al, 2018), QNLI (Rajpurkar et al, 2016), RTE (Dagan et al, 2005;Haim et al, 2006;Giampiccolo et al, 2007;Bentivogli et al, 2009)) 1 . We report accuracy for MNLI, QNLI, QQP, SST-2, RTE, report f1 for MRPC, and report Matthew's correlation for CoLA.…”
Section: Setupmentioning
confidence: 99%