Consistency Training with Virtual Adversarial Discrete Perturbation

Park, Jung Soo; Kim, Gyuwan; Kang, Jaewoo

doi:10.18653/v1/2022.naacl-main.414

Cited by 9 publications

(5 citation statements)

References 55 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…After that, the neural network minimizes cross-entropy on labeled data and the KL-divergence under disturbance on all data. Similar ideas have been explored in (Cicek and Soatto, 2019;Kim et al, 2019;Park et al, 2022).…”

Section: Comparison With Virtual Adversarial Trainingmentioning

confidence: 58%

“…Consistency training methods (Laine and Aila, 2017;Sajjadi et al, 2016;Wei and Zou, 2019;Ng et al, 2020;Xie et al, 2020) force the model to make consistent predictions against small perturbations. For example, Park et al (2022) produce discrete virtual adversarial noise to the token embeddings. (Zhou et al, 2019) and unsupervised domain adaptation (UDA) (Wang et al, 2019;Long et al, 2022), depending on whether the target-domain data are labeled or unlabeled.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Improving the Sample Efficiency of Prompt Tuning with Domain Adaptation

Xu¹,

Li²,

Han³

2022

Preprint

View full text Add to dashboard Cite

Prompt tuning, or the conditioning of a frozen pretrained language model (PLM) with soft prompts learned from data, has demonstrated impressive performance on a wide range of NLP tasks. However, prompt tuning requires a large training dataset to be effective and is outperformed by finetuning the entire PLM in data-scarce regimes. Previous work (Gu et al., 2022;Vu et al., 2022) proposed to transfer soft prompts pretrained on the source domain to the target domain. In this paper, we explore domain adaptation for prompt tuning, a problem setting where unlabeled data from the target domain are available during pretraining. We propose bOosting Prompt TunIng with doMain Adaptation (OPTIMA), which regularizes the decision boundary to be smooth around regions where source and target data distributions are similar. Extensive experiments demonstrate that OPTIMA significantly enhances the transferability and sample-efficiency of prompt tuning compared to strong baselines. Moreover, in few-shot settings, OPTIMA exceeds full-model tuning by a large margin.

show abstract

Section: Comparison With Virtual Adversarial Trainingmentioning

confidence: 58%

Section: Related Workmentioning

confidence: 99%

Improving the Sample Efficiency of Prompt Tuning with Domain Adaptation

Xu¹,

Li²,

Han³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…This kind of regularization technique has been widely adopted in NLP. For example, [59] produce discrete virtual adversarial noise to the token embeddings. [121] apply mixup to perturb the spans of the input texts for text classification for consistency training.…”

Section: B Adversarial Learningmentioning

confidence: 99%

On the Domain Adaptation and Generalization of Pretrained Language Models: A Survey

Xu¹,

Han²

2022

Preprint

View full text Add to dashboard Cite

Recent advances in NLP are brought by a range of large-scale pretrained language models (PLMs). These PLMs have brought significant performance gains for a range of NLP tasks, circumventing the need to customize complex designs for specific tasks. However, most current work focus on finetuning PLMs on a domain-specific datasets, ignoring the fact that the domain gap can lead to overfitting and even performance drop. Therefore, it is practically important to find an appropriate method to effectively adapt PLMs to a target domain of interest. Recently, a range of methods have been proposed to achieve this purpose. Early surveys on domain adaptation are not suitable for PLMs due to the sophisticated behavior exhibited by PLMs from traditional models trained from scratch and that domain adaptation of PLMs need to be redesigned to take effect. This paper aims to provide a survey on these newly proposed methods and shed light in how to apply traditional machine learning methods to newly evolved and future technologies. By examining the issues of deploying PLMs for downstream tasks, we propose a taxonomy of domain adaptation approaches from a machine learning system view, covering methods for input augmentation, model optimization and personalization. We discuss and compare those methods and suggest promising future research directions.

show abstract

“…Many recent works use 'consistency regularisation' to improve the generalisation of fine-tuned pre-trained models, both multilingual and Englishonly (Jiang et al, 2020;Aghajanyan et al, 2020;Zheng et al, 2021;Park et al, 2021;Liang et al, 2021). These works encourage model outputs to be similar between a perturbed and normal version of the input, usually via penalising the Kullback-Leibler (KL) divergence between the probability distribution of the perturbed and normal model.…”

Section: Introductionmentioning

confidence: 99%

“…They also rarely compare to traditional regularisation methods like dropout or L2 regularisation. Finally such methods either require a complex adversarial training step (Jiang et al, 2020;Park et al, 2021), or tuning many hyper-parameters like type of noise, level of noise, and weight given to the consistency loss term.…”

Section: Introductionmentioning

confidence: 99%