Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021
DOI: 10.18653/v1/2021.emnlp-main.566
|View full text |Cite
|
Sign up to set email alerts
|

Back-Training excels Self-Training at Unsupervised Domain Adaptation of Question Generation and Passage Retrieval

Abstract: In this work, we introduce back-training, an alternative to self-training for unsupervised domain adaptation (UDA) from source to target domain. While self-training generates synthetic training data where natural inputs are aligned with noisy outputs, back-training results in natural outputs aligned with noisy inputs. This significantly reduces the gap between the target domain and synthetic data distribution, and reduces model overfitting to the source domain. We run UDA experiments on question generation and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 41 publications
0
7
0
Order By: Relevance
“…Chen et al [5] create a large-scale Educational QG dataset from KhanAcademy and TED-Ed data sources as a learning and assessment tools for students. Kulshreshtha et al [17] also release a QG dataset comprising of data-science questions to promote research in domain adaptation. Unlike our questions, the questions in Chen et al [5], Kulshreshtha et al [17] are static and not personalized to the student.…”
Section: Experiments and Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…Chen et al [5] create a large-scale Educational QG dataset from KhanAcademy and TED-Ed data sources as a learning and assessment tools for students. Kulshreshtha et al [17] also release a QG dataset comprising of data-science questions to promote research in domain adaptation. Unlike our questions, the questions in Chen et al [5], Kulshreshtha et al [17] are static and not personalized to the student.…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…BART is a Transformer autoencoder pre-trained to reconstruct text from noisy text inputs. For QG, it learns a conditional probablity distribution P(q|r) to generate question q from reference solution r. We experiment with two pre-trained checkpoints -a) original BART-base checkpoint provided by authors and b) BART model trained on 50K MLQuestions dataset using back-training algorithm [17]. The latter model is able to generate good-quality questions for data science domain which is also our domain of interest in Korbit ITS.…”
Section: Few-shot Question Generation (Qg) Modelmentioning
confidence: 99%
See 2 more Smart Citations
“…Moreover, many recent works Kulshreshtha et al, 2021) find that the domain generalization ability of dense retrieval models is weak. Inspired by , we introduce two out-of-domain testing sets from the medical domain, including cMedQA and cCOVID-News † as the separate testing sets (see Section 2.4).…”
Section: Introductionmentioning
confidence: 99%