Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers) 2017
DOI: 10.18653/v1/p17-1147
|View full text |Cite
|
Sign up to set email alerts
|

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

Abstract: We present TriviaQA, a challenging reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaQA includes 95K questionanswer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions. We show that, in comparison to other recently introduced large-scale datasets, TriviaQA (1) has relatively complex, compositional questions, (2) has considerable syntacti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
1,067
0
10

Year Published

2017
2017
2019
2019

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 1,074 publications
(1,077 citation statements)
references
References 20 publications
0
1,067
0
10
Order By: Relevance
“…These subjects are: medicine, (4k of questions), history (3k of questions), biology (2k of questions). The resulting dataset is somewhat similar to Trivia QA dataset, however the domains are different [8].…”
Section: Multiple Choice Question Answering (Mcqa)mentioning
confidence: 86%
“…These subjects are: medicine, (4k of questions), history (3k of questions), biology (2k of questions). The resulting dataset is somewhat similar to Trivia QA dataset, however the domains are different [8].…”
Section: Multiple Choice Question Answering (Mcqa)mentioning
confidence: 86%
“…TriviaQA (Joshi et al, 2017) contains automatically collected question-answer pairs from 14 trivia and quiz-league websites, together with webcrawled evidence documents from Wikipedia and Bing. While a majority of questions require world knowledge for finding the correct answer, it is mostly factual knowledge.…”
Section: Related Workmentioning
confidence: 99%
“…CoQA [17] is a large-scale reading comprehension dataset which contains questions that depend on a conversation history. TriviaQA [21] and SQuAD 2.0 [9] pay attention to complex reasoning questions, which means that we need to jointly infer the answers via multiple sentences. Compared with English datasets, Chinese reading comprehension datasets are quite rare.…”
Section: Reading Comprehension Datasetsmentioning
confidence: 99%