Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers) 2017
DOI: 10.18653/v1/p17-1075
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation Metrics for Machine Reading Comprehension: Prerequisite Skills and Readability

Abstract: Knowing the quality of reading comprehension (RC) datasets is important for the development of natural-language understanding systems. In this study, two classes of metrics were adopted for evaluating RC datasets: prerequisite skills and readability. We applied these classes to six existing datasets, including MCTest and SQuAD, and highlighted the characteristics of the datasets according to each metric and the correlation between the two classes. Our dataset analysis suggests that the readability of RC datase… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
35
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 32 publications
(36 citation statements)
references
References 25 publications
1
35
0
Order By: Relevance
“…For example, Weston et al (2015) defined 20 skills as a set of toy tasks. Sugawara et al (2017) also organized 10 prerequisite skills for MRC. LoBue and Yates (2011) and Sammons et al (2010) analyzed entailment phenomena using detailed classifications in RTE.…”
Section: Related Workmentioning
confidence: 99%
“…For example, Weston et al (2015) defined 20 skills as a set of toy tasks. Sugawara et al (2017) also organized 10 prerequisite skills for MRC. LoBue and Yates (2011) and Sammons et al (2010) analyzed entailment phenomena using detailed classifications in RTE.…”
Section: Related Workmentioning
confidence: 99%
“…Finally, we will describe the human analog of the models' strategy, followed by our conclusions. Sugawara et al (2017) evaluated various datasets, in particular SQuAD, to determine how many human reading skills were required to answer questions. They described SQuAD as "difficult to read but easy to answer" for humans, finding that SQuAD requires only a few simple skills.…”
Section: Text Organizationmentioning
confidence: 99%
“…Reading the passage will prime the question creators towards questions based on interrogative paraphrases of the passage. As noted by Sugawara et al (2017), "SQuAD was difficult to read," which should further magnify this effect: when the passage is hard to read, it is easier and faster to scan it for a sentence stating a fact and to reformulate that sentence as a question. In particular, since crowdworkers are not motivated by a genuine need for information, we can expect them to use the first question that came to mind.…”
Section: Priming During Data Collectionmentioning
confidence: 99%
“…1 Support for this is given in Sugawara et al (2017), who show that Who-did-what dataset, for example, requires on average a larger number of reading skills than SQuAD (Rajpurkar et al, 2016) and MCTest (Richardson et al, 2013).…”
Section: Related Datasetsmentioning
confidence: 99%
“…In annotating the skills, we followed the categorization by Sugawara et al (2017) . Bridging: inference through grammatical and lexical knowledge (synonymy, idioms etc).…”
Section: B List Of Skills With Selected Examplesmentioning
confidence: 99%