QASC: A Dataset for Question Answering via Sentence Composition

Khot, Tushar; Clark, Peter E.; Guerquin, Michal; Jansen, Peter; Sabharwal, Ashish

doi:10.1609/aaai.v34i05.6319

Cited by 120 publications

(154 citation statements)

References 18 publications

Supporting

Mentioning

152

Contrasting

Order By: Relevance

“…Multiple-choice QA (MC). We use the following MC datasets: MCTest (Richardson et al, 2013), RACE (Lai et al, 2017), OpenBookQA/OBQA (Mihaylov et al, 2018), ARC (Clark et al, , 2016, QASC (Khot et al, 2019), CommonsenseQA/CQA , PIQA (Bisk et al, 2020), SIQA (Sap et al, 2019), and Winogrande (Sakaguchi et al, 2020). Several of the MC datasets do not come with accompanying paragraphs (such as ARC, QASC, OBQA).…”

Section: Datasetsmentioning

confidence: 99%

“…The first few rows of the table show T5 models trained for individual formats, followed by UNI-FIEDQA. For completeness, we include the highest previous scores for each dataset; one must be careful when reading these numbers as the best previous numbers follow the fully supervised protocol (for NewsQA (Zhang et al, 2020), Quoref (Segal et al, 2019), DROP (Lan et al, 2019), ROPES (Lin et al, 2019), QASC (Khot et al, 2019), CommonsenseQA (Zhu et al, 2020) and x-CS datasets (Gardner et al, 2020). )…”

Section: Generalization To Unseen Datasetsmentioning

confidence: 99%

See 1 more Smart Citation

UNIFIEDQA: Crossing Format Boundaries with a Single QA System

Khashabi¹,

Min²,

Khot³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

Self Cite

358

327

View full text Add to dashboard Cite

Question answering (QA) tasks have been posed using a variety of formats, such as extractive span selection, multiple choice, etc. This has led to format-specialized models, and even to an implicit division in the QA community. We argue that such boundaries are artificial and perhaps unnecessary, given the reasoning abilities we seek to teach are not governed by the format. As evidence, we use the latest advances in language modeling to build a single pre-trained QA model, UNIFIEDQA, that performs well across 20 QA datasets spanning 4 diverse formats. UNIFIEDQA performs on par with 8 different models that were trained on individual datasets themselves. Even when faced with 12 unseen datasets of observed formats, UNIFIEDQA performs surprisingly well, showing strong generalization from its out-offormat training data. Finally, fine-tuning this pre-trained QA model into specialized models results in a new state of the art on 10 factoid and commonsense QA datasets, establishing UNIFIEDQA as a strong starting point for building QA systems. 1 1 https://github.com/allenai/unifiedqa Extractive [SQuAD] Question: At what speed did the turbine operate? Context: (Nikola_Tesla) On his 50th birthday in 1906, Tesla demonstrated his 200 horsepower (150 kilowatts) 16,000 rpm bladeless turbine. ... Gold answer: 16,000 rpm Multiple-Choice [ARC-challenge] Question: What does photosynthesis produce that helps plants grow? Candidate Answers: (A) water (B) oxygen (C) protein (D) sugar Gold answer: sugar Yes/No [BoolQ] Question: Was America the first country to have a president? Context: (President) The first usage of the word president to denote the highest official in a government was during the Commonwealth of England ... Gold answer: no Abstractive [NarrativeQA]Question: What does a drink from narcissus's spring cause the drinker to do? Context: Mercury has awakened Echo, who weeps for Narcissus, and states that a drink from Narcissus's spring causes the drinkers to "Grow dotingly enamored of themselves." ...

show abstract

Section: Datasetsmentioning

confidence: 99%

Section: Generalization To Unseen Datasetsmentioning

confidence: 99%

UNIFIEDQA: Crossing Format Boundaries with a Single QA System

Khashabi¹,

Min²,

Khot³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

Self Cite

358

327

View full text Add to dashboard Cite

show abstract

“…2017), we extend their architecture with a hierarchy-like structure of bidirectional LSTM (BiLSTM) layers with max pooling. All in all, our model improves the previous state of the art for SciTail (Khot, Sabharwal, and Clark 2018) and achieves strong results for the Stanford Natural Language Inference (SNLI) and Multi-Genre Natural Language Inference corpus (MultiNLI; Williams, Nangia, and Bowman 2018).…”

Section: Introductionmentioning

confidence: 55%

“…SciTail: SciTail (Khot et al . 2018) is an NLI dataset created from multiple-choice science exams consisting of 27k sentence pairs. Each question and the correct answer choice have been converted into an assertive statement to form the hypothesis.…”

Section: Evaluation Benchmarksmentioning

confidence: 99%

See 1 more Smart Citation

Sentence embeddings in NLI with iterative refinement encoders

2019

View full text Add to dashboard Cite

Sentence-level representations are necessary for various NLP tasks. Recurrent neural networks have proven to be very effective in learning distributed representations and can be trained efficiently on natural language inference tasks. We build on top of one such model and propose a hierarchy of BiLSTM and max pooling layers that implements an iterative refinement strategy and yields state of the art results on the SciTail dataset as well as strong results for SNLI and MultiNLI. We can show that the sentence embeddings learned in this way can be utilized in a wide variety of transfer learning tasks, outperforming InferSent on 7 out of 10 and SkipThought on 8 out of 9 SentEval sentence embedding evaluation tasks. Furthermore, our model beats the InferSent model in 8 out of 10 recently published SentEval probing tasks designed to evaluate sentence embeddings' ability to capture some of the important linguistic properties of sentences.

show abstract

Inferring Logical Clauses for Answering Complex Multi-hop Open Domain Questions

Galitsky

2020

Human–Computer Interaction Series

View full text Add to dashboard Cite

QASC: A Dataset for Question Answering via Sentence Composition

Cited by 120 publications

References 18 publications

UNIFIEDQA: Crossing Format Boundaries with a Single QA System

UNIFIEDQA: Crossing Format Boundaries with a Single QA System

Sentence embeddings in NLI with iterative refinement encoders

Inferring Logical Clauses for Answering Complex Multi-hop Open Domain Questions

Contact Info

Product

Resources

About