NewsQA: A Machine Comprehension Dataset

Trischler, Adam; Wang, Tong; Yuan, Xingdi; Harris, Justin A.; Sordoni, Alessandro; Bachman, Philip; Suleman, Kaheer

doi:10.48550/arxiv.1611.09830

Cited by 82 publications

(137 citation statements)

References 0 publications

Supporting

Mentioning

111

Contrasting

Unclassified

Order By: Relevance

“…In principle, as a dual task of QA, any QA datasets can be used for QG [50]. SQuAD [58], MS-MARCO [4] and newsQA [73] are three famous datasets used for answer-extraction QG, collected from Wikipedia, Bing search logs, and CNN news respectively. Unlike the previous three datasets, Nar-rativeQA [35] does not restrict the answers to be the span of texts in the articles, therefore, it can be used as an answer-abstraction QG dataset.…”

Section: Related Work 21 Question Generationmentioning

confidence: 99%

Unified Question Generation with Continual Lifelong Learning

Yuan,

Yin,

et al. 2022

Preprint

View full text Add to dashboard Cite

Question Generation (QG), as a challenging Natural Language Processing task, aims at generating questions based on given answers and context. Existing QG methods mainly focus on building or training models for specific QG datasets. These works are subject to two major limitations: (1) They are dedicated to specific QG formats (e.g., answer-extraction or multi-choice QG), therefore, if we want to address a new format of QG, a re-design of the QG model is required. (2) Optimal performance is only achieved on the dataset they were just trained on. As a result, we have to train and keep various QG models for different QG datasets, which is resource-intensive and ungeneralizable.To solve the problems, we propose a model named Unified-QG based on lifelong learning techniques, which can continually learn QG tasks across different datasets and formats. Specifically, we first build a format-convert encoding to transform different kinds of QG formats into a unified representation. Then, a method named STRIDER (SimilariT y RegularI zed Difficult Example Replay) is built to alleviate catastrophic forgetting in continual QG learning. Extensive experiments were conducted on 8 QG datasets across 4 QG formats (answer-extraction, answer-abstraction, multi-choice, and boolean QG) to demonstrate the effectiveness of our approach. Experimental results demonstrate that our Unified-QG can effectively and continually adapt to QG tasks when datasets and formats vary. In addition, we verify the ability of a single trained Unified-QG model in improving 8 Question Answering (QA) systems' performance through generating synthetic QA data.

show abstract

Section: Related Work 21 Question Generationmentioning

confidence: 99%

Unified Question Generation with Continual Lifelong Learning

Yuan,

Yin,

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The famous SQuAD dataset [27,26] for the first time introduces human-generated free-form questions, which requires the machine to understand natural language to select the correct span in Wikipedia pages. Similar datasets follow this trend of using free-form questions and adopt reading documents from a variety of sources, such as news articles [37,17] and dialogues [18,28,4]. In addition to these datasets where the answers can be directly extracted from the document, another popular type of datasets, i.e., abstractive datasets, ask the reader to generate an answer that may not be found in the given context [23,12].…”

Section: Related Workmentioning

confidence: 99%

Native Chinese Reader: A Dataset Towards Native-Level Chinese Machine Reading Comprehension

Xu¹,

Yi-chen²,

Yi³

et al. 2021

Preprint

View full text Add to dashboard Cite

We present Native Chinese Reader (NCR), a new machine reading comprehension (MRC) dataset with particularly long articles in both modern and classical Chinese. NCR is collected from the exam questions for the Chinese course in China's high schools, which are designed to evaluate the language proficiency of native Chinese youth. Existing Chinese MRC datasets are either domain-specific or focusing on short contexts of a few hundreds of characters in modern Chinese only. By contrast, NCR contains 8390 documents with an average length of 1024 characters covering a wide range of Chinese writing styles, including modern articles, classical literature and classical poetry. A total of 20477 questions on these documents also require strong reasoning abilities and common sense to figure out the correct answers. We implemented multiple baseline models using popular Chinese pretrained models and additionally launched an online competition using our dataset to examine the limit of current methods. The best model achieves 59% test accuracy while human evaluation shows an average accuracy of 79%, which indicates a significant performance gap between current MRC models and native Chinese speakers. We release the dataset at https://sites.google.com/ view/native-chinese-reader/.

show abstract

“…The original version of SQuAD [5] was published in 2016. There are also a variety of benchmarks related to question-answering -NewsQA [12], SearchQA [13], TriviaQA [14], HotpotQA [15], and Natural Questions [16]. Nonetheless, SQuAD is one of the most commonly used question-answering benchmarks.…”

Section: Related Workmentioning

confidence: 99%

When to Fold'em: How to answer Unanswerable questions

Ho,

Zhou,

2021

Preprint

View full text Add to dashboard Cite

We present 3 different question-answering models trained on the SQuAD2.0 dataset [1]-BIDAF [2], DocumentQA [3] and ALBERT Retro-Reader [4]-demonstrating the improvement of language models in the past 3 years. Through our research in fine-tuning pre-trained models for question-answering, we developed a novel approach capable of achieving a 2% point improvement in SQuAD2.0 F1 in reduced training time. Our method of re-initializing select layers of a parameter-shared language model is simple yet empirically powerful. Models and Methods BIDAFIn 2016, Bi-Directional Attention Flow (BIDAF) [2] was introduced. Based on the function of different layers, we divide 6 layers into 3 parts. There are 3 layers in the first part, Character Embedding, Word Embeddin, and Contextual Embedding Layer. They embed the words from both query and context to two attention flow, H ∈ R 2d×T for context and U ∈ R 2d×J for query, where T and J represents the number of words in the input context and query.

show abstract

NewsQA: A Machine Comprehension Dataset

Cited by 82 publications

References 0 publications

Unified Question Generation with Continual Lifelong Learning

Unified Question Generation with Continual Lifelong Learning

Native Chinese Reader: A Dataset Towards Native-Level Chinese Machine Reading Comprehension

When to Fold'em: How to answer Unanswerable questions

Contact Info

Product

Resources

About