Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2022
DOI: 10.18653/v1/2022.naacl-main.414
|View full text |Cite
|
Sign up to set email alerts
|

Consistency Training with Virtual Adversarial Discrete Perturbation

Abstract: Consistency training regularizes a model by enforcing predictions of original and perturbed inputs to be similar. Previous studies have proposed various augmentation methods for the perturbation but are limited in that they are agnostic to the training model. Thus, the perturbed samples may not aid in regularization due to their ease of classification from the model. In this context, we propose an augmentation method of adding a discrete noise that would incur the highest divergence between predictions. This v… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 55 publications
0
5
0
Order By: Relevance
“…After that, the neural network minimizes cross-entropy on labeled data and the KL-divergence under disturbance on all data. Similar ideas have been explored in (Cicek and Soatto, 2019;Kim et al, 2019;Park et al, 2022).…”
Section: Comparison With Virtual Adversarial Trainingmentioning
confidence: 58%
See 1 more Smart Citation
“…After that, the neural network minimizes cross-entropy on labeled data and the KL-divergence under disturbance on all data. Similar ideas have been explored in (Cicek and Soatto, 2019;Kim et al, 2019;Park et al, 2022).…”
Section: Comparison With Virtual Adversarial Trainingmentioning
confidence: 58%
“…Consistency training methods (Laine and Aila, 2017;Sajjadi et al, 2016;Wei and Zou, 2019;Ng et al, 2020;Xie et al, 2020) force the model to make consistent predictions against small perturbations. For example, Park et al (2022) produce discrete virtual adversarial noise to the token embeddings. (Zhou et al, 2019) and unsupervised domain adaptation (UDA) (Wang et al, 2019;Long et al, 2022), depending on whether the target-domain data are labeled or unlabeled.…”
Section: Related Workmentioning
confidence: 99%
“…This kind of regularization technique has been widely adopted in NLP. For example, [59] produce discrete virtual adversarial noise to the token embeddings. [121] apply mixup to perturb the spans of the input texts for text classification for consistency training.…”
Section: B Adversarial Learningmentioning
confidence: 99%
“…Many recent works use 'consistency regularisation' to improve the generalisation of fine-tuned pre-trained models, both multilingual and Englishonly (Jiang et al, 2020;Aghajanyan et al, 2020;Zheng et al, 2021;Park et al, 2021;Liang et al, 2021). These works encourage model outputs to be similar between a perturbed and normal version of the input, usually via penalising the Kullback-Leibler (KL) divergence between the probability distribution of the perturbed and normal model.…”
Section: Introductionmentioning
confidence: 99%
“…They also rarely compare to traditional regularisation methods like dropout or L2 regularisation. Finally such methods either require a complex adversarial training step (Jiang et al, 2020;Park et al, 2021), or tuning many hyper-parameters like type of noise, level of noise, and weight given to the consistency loss term.…”
Section: Introductionmentioning
confidence: 99%