2019
DOI: 10.48550/arxiv.1904.12848
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Unsupervised Data Augmentation for Consistency Training

Qizhe Xie,
Zihang Dai,
Eduard Hovy
et al.

Abstract: Semi-supervised learning lately has shown much promise in improving deep learning models when labeled data is scarce. Common among recent approaches is the use of consistency training on a large amount of unlabeled data to constrain model predictions to be invariant to input noise. In this work, we present a new perspective on how to effectively noise unlabeled examples and argue that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

6
546
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 287 publications
(572 citation statements)
references
References 33 publications
6
546
0
Order By: Relevance
“…Virtual Adversarial Training (VAT) (Miyato et al, 2018) uses an effective regularization technique that uses slight perturbations such that the prediction of the unlabeled samples is affected the most. More recent techniques like FixMatch (Sohn et al, 2020), MixMatch (Berthelot et al, 2019) and UDA (Xie et al, 2019) use data augmentations like flip, rotation, and crops to predict pseudo-labels. In this paper, we propose a new SSL technique that uses class-wise instantiations of SMI functions that mitigates the issue of class-imbalance in selected subsets and is comparatively robust to OOD classes in the unlabeled set.…”
Section: Related Workmentioning
confidence: 99%
“…Virtual Adversarial Training (VAT) (Miyato et al, 2018) uses an effective regularization technique that uses slight perturbations such that the prediction of the unlabeled samples is affected the most. More recent techniques like FixMatch (Sohn et al, 2020), MixMatch (Berthelot et al, 2019) and UDA (Xie et al, 2019) use data augmentations like flip, rotation, and crops to predict pseudo-labels. In this paper, we propose a new SSL technique that uses class-wise instantiations of SMI functions that mitigates the issue of class-imbalance in selected subsets and is comparatively robust to OOD classes in the unlabeled set.…”
Section: Related Workmentioning
confidence: 99%
“…[25] systematically examined some basic augmentation methods including random synonyms replacement, word insertion, etc. [27] utilized tf-idf to help determine which words to replace. [23] adopted k-nearest neighbors to find synonyms in word embedding space.…”
Section: Data Augmentation For Natural Language Processing (Nlp)mentioning
confidence: 99%
“…Neural MWP solvers can benefit from our augmentation strategies in terms of generalization and the ability of dealing with tiny local variances. Unlike other popular augmentation approaches [25,27,30], which may cause inconsistency of the questions and equations in MWP task, our augmentation methods are carefully designed for MWP task to ensure consistency.…”
Section: Introductionmentioning
confidence: 99%
“…Temporal Ensembling [19] uses previous model checkpoints while Mean Teacher [26] uses an exponential moving average of model parameters. UDA [30] and ReMix-Match [18] sharpen the soft label to make the model to produce high-confidence predictions. UDA further reinforces consistency only when the highest probability of the predicted category distribution for soft labels is above a threshold.…”
Section: Semi-supervised Learningmentioning
confidence: 99%
“…However, the predictions made by the teacher could be noisy, especially at the beginning of the training process, which hinders the model from fitting to the supervised loss. A solution is to filter out low-quality predictions with the confidence-based masking strategy [30]. Specifically, we maintain a confident node set V 𝐶 ⊆ V 𝑈 in the training stage whose elements are unlabeled node with highly skewed predictions, i.e.,…”
Section: Additional Training Techniquesmentioning
confidence: 99%