2017
DOI: 10.48550/arxiv.1710.09412
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

mixup: Beyond Empirical Risk Minimization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

13
1,661
3
4

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 1,367 publications
(1,684 citation statements)
references
References 14 publications
13
1,661
3
4
Order By: Relevance
“…Here, we first discuss the benefits of finetuning deep networks trained by contrastive learning over that of training randomly initialized networks, on boosting robustness against noisy labels. Then, we show that the initial robustness provided by contrastive learning enables robust training methods that are effective under small to moderate amount of noisy labels (Liu et al, 2020;Zhang et al, 2017;Mirzasoleiman et al, 2020) to achieve the state-ofthe-art performance under extreme noise levels.…”
Section: Fine-tuning the Network To Boost Performancementioning
confidence: 87%
See 2 more Smart Citations
“…Here, we first discuss the benefits of finetuning deep networks trained by contrastive learning over that of training randomly initialized networks, on boosting robustness against noisy labels. Then, we show that the initial robustness provided by contrastive learning enables robust training methods that are effective under small to moderate amount of noisy labels (Liu et al, 2020;Zhang et al, 2017;Mirzasoleiman et al, 2020) to achieve the state-ofthe-art performance under extreme noise levels.…”
Section: Fine-tuning the Network To Boost Performancementioning
confidence: 87%
“…The initial level of robustness provided by contrastive learning can be leveraged by existing robust training methods to achieve a superior performance under extreme noise levels. Next, we briefly discuss three methods that prevent the pre-trained network from overfitting the noisy labels, through regularization (Liu et al, 2020;Zhang et al, 2017), or identifying clean examples (Mirzasoleiman et al, 2020).…”
Section: Fine-tuning the Network To Boost Performancementioning
confidence: 99%
See 1 more Smart Citation
“…We train different iterations for each task. We used random horizontal and vertical flips, 90 • rotation, and MixUp [109] with probability 0.5 for data augmentation. We used the Adam optimizer [38] with an initial learning rate of 2×10 −4 , which are steadily decreased to 10 −7 with the cosine annealing decay [58].…”
Section: Methodsmentioning
confidence: 99%
“…[26] 0.1 Drop path [17] 0.1 0.1 0.15 0.3 Repeated augment [15] RandAugment [5] Mixup prob. [40] 0.8 Cutmix prob. [39] 1.0 Erasing prob.…”
Section: Methodsmentioning
confidence: 99%