Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2018
DOI: 10.18653/v1/p18-2124
|View full text |Cite
|
Sign up to set email alerts
|

Know What You Don’t Know: Unanswerable Questions for SQuAD

Abstract: Extractive reading comprehension systems can often locate the correct answer to a question in a context document, but they also tend to make unreliable guesses on questions for which the correct answer is not stated in the context. Existing datasets either focus exclusively on answerable questions, or use automatically generated unanswerable questions that are easy to identify. To address these weaknesses, we present SQuAD 2.0, the latest version of the Stanford Question Answering Dataset (SQuAD). SQuAD 2.0 co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
1,341
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 1,563 publications
(1,493 citation statements)
references
References 19 publications
4
1,341
0
Order By: Relevance
“…We evaluated BERT+Entity in the natural 1 TagMe's performance on various benchmark datasets ranges from 37% to 72%. F1 (Kolitsas et al, 2018) language understanding benchmark GLUE (Wang et al, 2018), the question answering (QA) benchmarks SQUAD V2 (Rajpurkar et al, 2018) and SWAG (Zellers et al, 2018), and the machine translation benchmark EN-DE WMT14. We confirm the finding from Zhang et al (2019) that additional entity knowledge is not beneficial for the GLUE benchmark.…”
Section: Introductionmentioning
confidence: 99%
“…We evaluated BERT+Entity in the natural 1 TagMe's performance on various benchmark datasets ranges from 37% to 72%. F1 (Kolitsas et al, 2018) language understanding benchmark GLUE (Wang et al, 2018), the question answering (QA) benchmarks SQUAD V2 (Rajpurkar et al, 2018) and SWAG (Zellers et al, 2018), and the machine translation benchmark EN-DE WMT14. We confirm the finding from Zhang et al (2019) that additional entity knowledge is not beneficial for the GLUE benchmark.…”
Section: Introductionmentioning
confidence: 99%
“…We leverage an entailment model and a QA model based on BERT [9]. For the entailment model, as the SQuAD 2.0 dataset [35] contains unanswerable questions, we utilize it to train a classifier which tells us whether a pair of <question, answer> matches with the content in the input passage. For the question answering model, we fine-tuned another BERT-based QA model utilizing the SQuAD 1.1 dataset [36].…”
Section: Data Filtering For Quality Controlmentioning
confidence: 99%
“…We use exactly the same format as the popular SQuAD2.0 [27] dataset for our preprocessing output. We keep all questions and answers for a random sample of 25% of the documents as a separate hold-out set.…”
Section: Data Preprocessingmentioning
confidence: 99%
“…In this section, we discuss how state-of-the-art models for answer selection perform on the DQA data and DQA enhanced with data from the SQuAD2.0 dataset [27]. We select this dataset for two reasons: first, it is a standard dataset for benchmarking Question Answering tasks and, second, like DQA, it contains questions marked as unanswerable, making it closely compatible with our collected data.…”
Section: Answer Selectionmentioning
confidence: 99%
See 1 more Smart Citation