Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2018
DOI: 10.18653/v1/p18-1156
|View full text |Cite
|
Sign up to set email alerts
|

DuoRC: Towards Complex Language Understanding with Paraphrased Reading Comprehension

Abstract: We propose DuoRC, a novel dataset for Reading Comprehension (RC) that motivates several new challenges for neural approaches in language understanding beyond those offered by existing RC datasets. DuoRC contains 186,089 unique questionanswer pairs created from a collection of 7680 pairs of movie plots where each pair in the collection reflects two versions of the same movie -one from Wikipedia and the other from IMDb -written by two different authors. We asked crowdsourced workers to create questions from one … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
81
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 89 publications
(93 citation statements)
references
References 17 publications
0
81
0
Order By: Relevance
“…We further test our model on the newly-released DuoRC dataset (Saha et al, 2018). This dataset contains two subsets: movie descriptions collected from Wikipedia (SelfRC) and from IMDB (ParaphraseRC).…”
Section: Results On Duorcmentioning
confidence: 99%
See 1 more Smart Citation
“…We further test our model on the newly-released DuoRC dataset (Saha et al, 2018). This dataset contains two subsets: movie descriptions collected from Wikipedia (SelfRC) and from IMDB (ParaphraseRC).…”
Section: Results On Duorcmentioning
confidence: 99%
“…Note that the answers of the same question could be different in the two subsets (only 40.7% of the questions have the same answers in both domains). We preprocess the dataset and test the answer-span extraction task following Saha et al (2018). Results are reported in Table 3.…”
Section: Results On Duorcmentioning
confidence: 99%
“…As questions are not proposed directly from documents, this task is challenging and some information extraction methods fail to deal with it. This methodology of creating MRC datasets enlightens lots of other researches [77,52,69]. In order to avoid that questions can be answered by knowledge out of the documents, all entities in documents are anonymized by random markers.…”
Section: -Cnn and Daily Mailmentioning
confidence: 99%
“…One way to reduce lexical overlap between the question and passage is to expose the author of the question to a different passage that conveys a similar meaning. Examples include NARRA-TIVEQA (Kočiskỳ et al, 2018), where question authors were shown a summary of a movie script that will be used for answering questions, and DUORC (Saha et al, 2018), where questions are authored given a passage that is comparable to the one that will later be employed.…”
Section: Question / Passage Mismatchmentioning
confidence: 99%