NewsQA: A Machine Comprehension Dataset

Trischler, Adam; Wang, Tong; Yuan, Xingdi; Harris, Justin; Sordoni, Alessandro; Bachman, Philip; Suleman, Kaheer

doi:10.18653/v1/w17-2623

Cited by 582 publications

(566 citation statements)

References 19 publications

Supporting

Mentioning

548

Contrasting

Unclassified

Order By: Relevance

“…NewsQA (Trischler et al, 2016): we randomly chose questions that satisfied the following conditions:…”

Section: A Sampling Methods For Questionsmentioning

confidence: 99%

Evaluation Metrics for Machine Reading Comprehension: Prerequisite Skills and Readability

Sugawara¹,

Kido²,

Yokono³

et al. 2017

Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

Knowing the quality of reading comprehension (RC) datasets is important for the development of natural-language understanding systems. In this study, two classes of metrics were adopted for evaluating RC datasets: prerequisite skills and readability. We applied these classes to six existing datasets, including MCTest and SQuAD, and highlighted the characteristics of the datasets according to each metric and the correlation between the two classes. Our dataset analysis suggests that the readability of RC datasets does not directly affect the question difficulty and that it is possible to create an RC dataset that is easy to read but difficult to answer.

show abstract

“…NewsQA (Trischler et al, 2016): we randomly chose questions that satisfied the following conditions:…”

Section: A Sampling Methods For Questionsmentioning

confidence: 99%

Evaluation Metrics for Machine Reading Comprehension: Prerequisite Skills and Readability

Sugawara¹,

Kido²,

Yokono³

et al. 2017

Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

show abstract

“…NewsQA The NewsQA dataset (Trischler et al, 2017) 3 contains 100k answerable questions from a total of 120k questions. The dataset is built from CNN news stories that were originally collected by Hermann et al (2015).…”

Section: Methodsmentioning

confidence: 99%

“…A thorough analysis by Chen et al (2016), however, revealed that the DailyMail/CNN was too easy and still quite noisy. New datasets were constructed to eliminate these problems including SQuAD (Rajpurkar et al, 2016), NewsQA (Trischler et al, 2017) and MsMARCO (Nguyen et al, 2016).…”

Section: Related Workmentioning

confidence: 99%

“…The creation of large-scale, extractive QA datasets (Rajpurkar et al, 2016;Trischler et al, 2017;Nguyen et al, 2016) sparked research interest into the development of end-to-end neural QA systems. A typical neural architecture consists of an embedding-, encoding-, interaction-and answer layer (Wang and Jiang, 2017;Yu et al, 2017;Xiong et al, 2017;Seo et al, 2017;Yang et al, 2017;.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

Lévy

Specia

2017

View full text Add to dashboard Cite

“…Datasets with natural language questions include MCTest (Richardson et al, 2013), SQuAD (Rajpurkar et al, 2016), and NewsQA (Trischler et al, 2016). MCTest is limited in scale with only 2640 multiple choice questions.…”

Section: Reading Comprehensionmentioning

confidence: 99%

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

Joshi

Choi

Weld

et al. 2017

Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers)

978

966

View full text Add to dashboard Cite

We present TriviaQA, a challenging reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaQA includes 95K questionanswer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions. We show that, in comparison to other recently introduced large-scale datasets, TriviaQA (1) has relatively complex, compositional questions, (2) has considerable syntactic and lexical variability between questions and corresponding answer-evidence sentences, and (3) requires more cross sentence reasoning to find answers. We also present two baseline algorithms: a featurebased classifier and a state-of-the-art neural network, that performs well on SQuAD reading comprehension. Neither approach comes close to human performance (23% and 40% vs. 80%), suggesting that TriviaQA is a challenging testbed that is worth significant future study. 1

show abstract

NewsQA: A Machine Comprehension Dataset

Cited by 582 publications

References 19 publications

Evaluation Metrics for Machine Reading Comprehension: Prerequisite Skills and Readability

Evaluation Metrics for Machine Reading Comprehension: Prerequisite Skills and Readability

Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

Contact Info

Product

Resources

About