Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.439
|View full text |Cite
|
Sign up to set email alerts
|

End-to-End Synthetic Data Generation for Domain Adaptation of Question Answering Systems

Abstract: We propose an end-to-end approach for synthetic QA data generation. Our model comprises a single transformer-based encoderdecoder network that is trained end-to-end to generate both answers and questions. In a nutshell, we feed a passage to the encoder and ask the decoder to generate a question and an answer token-by-token. The likelihood produced in the generation process is used as a filtering score, which avoids the need for a separate filtering model. Our generator is trained by finetuning a pretrained LM … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
72
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
2

Relationship

2
7

Authors

Journals

citations
Cited by 49 publications
(74 citation statements)
references
References 22 publications
1
72
1
Order By: Relevance
“…In order to enable the use of a QA driven metric to maximize factual correctness during the training of summarization models, we propose QUALS (QUestion Answering with Language model score for Summarization), which is illustrated in the bottom part of Figure 1. QUALS is an efficient metric that employs a single neural language model (QAGen), as proposed in (Shakeri et al, 2020), to generate both the questions and answers from the summary. In particular, given a summary, QAGen outputs a question-answer (q-a) pair jointly, separated by a special token <a> as shown in Figure 2.…”
Section: Quals (Ours)mentioning
confidence: 99%
“…In order to enable the use of a QA driven metric to maximize factual correctness during the training of summarization models, we propose QUALS (QUestion Answering with Language model score for Summarization), which is illustrated in the bottom part of Figure 1. QUALS is an efficient metric that employs a single neural language model (QAGen), as proposed in (Shakeri et al, 2020), to generate both the questions and answers from the summary. In particular, given a summary, QAGen outputs a question-answer (q-a) pair jointly, separated by a special token <a> as shown in Figure 2.…”
Section: Quals (Ours)mentioning
confidence: 99%
“…Such metrics are particularly problematic for QG evaluation since multiple plausible questions exist for a given passage and answer. Consequently, there has been a shift in focus to evaluating QG using an extrinsic evaluation that generates synthetic QA pairs for the purpose of evaluating their effectiveness as a data augmentation or unsupervised QA approach Puri et al, 2020;Shakeri et al, 2020). Unsupervised QA In unsupervised QA, the QA model is trained using synthetic data based on a QG model instead of an existing QA dataset.…”
Section: Related Workmentioning
confidence: 99%
“…Fabbri et al (2020) and propose template/rule-based methods for generating questions and employ retrieved paragraphs and cited passages as source passages to alleviate the problems of lexical similarities between passages and questions. ; Puri et al (2020); Shakeri et al (2020) additionally employ existing QA datasets to train a QG model. Although related, this work falls outside the scope of unsupervised QA.…”
Section: Related Workmentioning
confidence: 99%
“…Question generation (QG) (Liu et al, 2020;Sultan et al, 2020;Shakeri et al, 2020) has been widely explored in reading comprehension (RC) task to reduce the burden of annotating large volumes of Q-A pairs given a context paragraph. Recently, Puri et al (2020) used GPT-2 (Radford et al, 2019) to generate synthetic data for RC, showing that synthetic data alone is sufficient to obtain stateof-art on the SQUAD1.1 dataset.…”
Section: Related Workmentioning
confidence: 99%