End-to-End Synthetic Data Generation for Domain Adaptation of Question Answering Systems

Shakeri, Siamak; Santos, Cícero Nogueira dos; Zhu, Henghui; Ng, Patrick; Feng, Nan; Wang, Zhiguo; Nallapati, Ramesh; Xiang, Bing

doi:10.18653/v1/2020.emnlp-main.439

Cited by 49 publications

(74 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In order to enable the use of a QA driven metric to maximize factual correctness during the training of summarization models, we propose QUALS (QUestion Answering with Language model score for Summarization), which is illustrated in the bottom part of Figure 1. QUALS is an efficient metric that employs a single neural language model (QAGen), as proposed in (Shakeri et al, 2020), to generate both the questions and answers from the summary. In particular, given a summary, QAGen outputs a question-answer (q-a) pair jointly, separated by a special token <a> as shown in Figure 2.…”

Section: Quals (Ours)mentioning

confidence: 99%

Improving Factual Consistency of Abstractive Summarization via Question Answering

Feng¹,

Santos²,

Zhu³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

Self Cite

View full text Add to dashboard Cite

A commonly observed problem with the stateof-the art abstractive summarization models is that the generated summaries can be factually inconsistent with the input documents. The fact that automatic summarization may produce plausible-sounding yet inaccurate summaries is a major concern that limits its wide application. In this paper we present an approach to address factual consistency in summarization. We first propose an efficient automatic evaluation metric to measure factual consistency; next, we propose a novel learning algorithm that maximizes the proposed metric during model training. Through extensive experiments, we confirm that our method is effective in improving factual consistency and even overall quality of the summaries, as judged by both automatic metrics and human evaluation.

show abstract

Section: Quals (Ours)mentioning

confidence: 99%

Improving Factual Consistency of Abstractive Summarization via Question Answering

Feng¹,

Santos²,

Zhu³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

Self Cite

View full text Add to dashboard Cite

show abstract

“…Such metrics are particularly problematic for QG evaluation since multiple plausible questions exist for a given passage and answer. Consequently, there has been a shift in focus to evaluating QG using an extrinsic evaluation that generates synthetic QA pairs for the purpose of evaluating their effectiveness as a data augmentation or unsupervised QA approach Puri et al, 2020;Shakeri et al, 2020). Unsupervised QA In unsupervised QA, the QA model is trained using synthetic data based on a QG model instead of an existing QA dataset.…”

Section: Related Workmentioning

confidence: 99%

“…Fabbri et al (2020) and propose template/rule-based methods for generating questions and employ retrieved paragraphs and cited passages as source passages to alleviate the problems of lexical similarities between passages and questions. ; Puri et al (2020); Shakeri et al (2020) additionally employ existing QA datasets to train a QG model. Although related, this work falls outside the scope of unsupervised QA.…”

Section: Related Workmentioning

confidence: 99%

Improving Unsupervised Question Answering via Summarization-Informed Question Generation

Lyu¹,

Shang²,

Graham³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Question Generation (QG) is the task of generating a plausible question for a given pair. Template-based QG uses linguistically-informed heuristics to transform declarative sentences into interrogatives, whereas supervised QG uses existing Question Answering (QA) datasets to train a system to generate a question given a passage and an answer. A disadvantage of the heuristic approach is that the generated questions are heavily tied to their declarative counterparts. A disadvantage of the supervised approach is that they are heavily tied to the domain/language of the QA dataset used as training data. In order to overcome these shortcomings, we propose an unsupervised QG method which uses questions generated heuristically from summaries as a source of training data for a QG system. We make use of freely available news summary data, transforming declarative summary sentences into appropriate questions using heuristics informed by dependency parsing, named entity recognition and semantic role labeling. The resulting questions are then combined with the original news articles to train an end-to-end neural QG model. We extrinsically evaluate our approach using unsupervised QA: our QG model is used to generate synthetic QA pairs for training a QA model. Experimental results show that, trained with only 20k English Wikipedia-based synthetic QA pairs, the QA model substantially outperforms previous unsupervised models on three in-domain datasets (SQuAD1.1, Natural Questions, TriviaQA) and three out-of-domain datasets (NewsQA, BioASQ, DuoRC), demonstrating the transferability of the approach.

show abstract

“…Question generation (QG) (Liu et al, 2020;Sultan et al, 2020;Shakeri et al, 2020) has been widely explored in reading comprehension (RC) task to reduce the burden of annotating large volumes of Q-A pairs given a context paragraph. Recently, Puri et al (2020) used GPT-2 (Radford et al, 2019) to generate synthetic data for RC, showing that synthetic data alone is sufficient to obtain stateof-art on the SQUAD1.1 dataset.…”

Section: Related Workmentioning

confidence: 99%

Topic Transferable Table Question Answering

Chemmengath¹,

Kumar²,

Bharadwaj³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Weakly-supervised table question-answering (TableQA) models have achieved state-of-art performance by using pre-trained BERT transformer to jointly encoding a question and a table to produce structured query for the question. However, in practical settings TableQA systems are deployed over table corpora having topic and word distributions quite distinct from BERT's pretraining corpus. In this work we simulate the practical topic shift scenario by designing novel challenge benchmarks WikiSQL-TS and WikiTQ-TS 1 , consisting of train-dev-test splits in five distinct topic groups, based on the popular WikiSQL and WikiTableQuestions datasets. We empirically show that, despite pre-training on large open-domain text, performance of models degrades significantly when they are evaluated on unseen topics. In response, we propose T3QA (Topic Transferable Table Question Answering) a pragmatic adaptation framework for TableQA comprising of: (1) topic-specific vocabulary injection into BERT, (2) a novel text-to-text transformer generator (such as T5, GPT2) based natural language question generation pipeline focused on generating topic specific training data, and (3) a logical form reranker. We show that T3QA provides a reasonably good baseline for our topic shift benchmarks. We believe our topic split benchmarks will lead to robust TableQA solutions that are better suited for practical deployment.

show abstract

End-to-End Synthetic Data Generation for Domain Adaptation of Question Answering Systems

Cited by 49 publications

References 22 publications

Improving Factual Consistency of Abstractive Summarization via Question Answering

Improving Factual Consistency of Abstractive Summarization via Question Answering

Improving Unsupervised Question Answering via Summarization-Informed Question Generation

Topic Transferable Table Question Answering

Contact Info

Product

Resources

About