Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu 2018
DOI: 10.18653/v1/n18-1023
|View full text |Cite
|
Sign up to set email alerts
|

Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences

Abstract: We present a reading comprehension challenge in which questions can only be answered by taking into account information from multiple sentences. We solicit and verify questions and answers for this challenge through a 4-step crowdsourcing experiment. Our challenge dataset contains ∼6k questions for +800 paragraphs across 7 different domains (elementary school science, news, travel guides, fiction stories, etc) bringing in linguistic diversity to the texts and to the questions wordings. On a subset of our datas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
283
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 316 publications
(319 citation statements)
references
References 22 publications
0
283
0
1
Order By: Relevance
“…We evaluated ROCC coupled with the proposed QA approach on two QA datasets. We use the standard train/development/test partitions for each dataset, as well as the standard evaluation measures: accuracy for ARC , and F1 m (macro-F1 score), F1 a (micro-F1 score), and EM0 (exact match) for MultiRC (Khashabi et al, 2018a).…”
Section: Empirical Evaluationmentioning
confidence: 99%
“…We evaluated ROCC coupled with the proposed QA approach on two QA datasets. We use the standard train/development/test partitions for each dataset, as well as the standard evaluation measures: accuracy for ARC , and F1 m (macro-F1 score), F1 a (micro-F1 score), and EM0 (exact match) for MultiRC (Khashabi et al, 2018a).…”
Section: Empirical Evaluationmentioning
confidence: 99%
“…There exist several other multi-hop reasoning datasets including WorldTree , OpenBookQA (Mihaylov et al, 2018), and Mul-tiRC (Khashabi et al, 2018). These datasets are more complex to analyze since the answers may not appear directly in the passage and may simply be entailed by passage content.…”
Section: Discussionmentioning
confidence: 99%
“…The difference between QAngaroo and our focus is two-fold: (1) QAngaroo does not have supervised evidence and (2) the questions in QAngaroo are inherently limited because the dataset is constructed using a knowledge base. MultiRC (Khashabi et al, 2018) is also an explainable multi-hop QA dataset that provides gold evidence sentences. However, it is difficult to compare the performance of the evidence extraction with other studies because its evaluation script and leaderboard do not report the evidence extraction score.…”
Section: Reading Comprehensionmentioning
confidence: 99%