Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels

Han, Bo; Yao, Quanming; Yu, Xingrui; Niu, Gang; Xu, Miao; Hu, Weihua; Tsang, Ivor W.; Sugiyama, Masashi

doi:10.48550/arxiv.1804.06872

Cited by 55 publications

(104 citation statements)

References 33 publications

Supporting

Mentioning

100

Contrasting

Order By: Relevance

“…Another family of methods focus primarily on sample selection, where the model selects small-loss samples as "clean" samples under the assumption that the model first fits to the clean samples before memorizing the noisy samples (also known as the early-learning assumption) (Arpit et al, 2017;Zhang et al, 2016;. Han et al (2018) andYu et al (2019) proposed Co-teaching where sample selection is conducted using two networks to select clean and noisy samples, and then the clean samples are used for further training. MentorNet (Jiang et al, 2018) is a student-teacher framework where a pre-trained teacher network guides the learning of the student network with clean samples (whose labels are deemed "correct").…”

Section: Related Workmentioning

confidence: 99%

“…First, several noise robust loss functions (Ghosh et al, 2017;Wang et al, 2019a;Zhang & Sabuncu, 2018) were proposed that are inherently tolerant to label noise. Second, sample selection methods (also referred to as loss correction in some literature) (Han et al, 2018;Yu et al, 2019;Arazo et al, 2019) are a popular technique that analyzes the per-sample loss distribution and separates the clean and noisy samples. The identified noisy samples are then re-weighted so that they contribute less in the loss computation.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

PARS: Pseudo-Label Aware Robust Sample Selection for Learning with Noisy Labels

Goel¹,

Jiao²,

Massiah³

2022

Preprint

View full text Add to dashboard Cite

Acquiring accurate labels on large-scale datasets is both time consuming and expensive. To reduce the dependency of deep learning models on learning from clean labeled data, several recent research efforts are focused on learning with noisy labels. These methods typically fall into three design categories to learn a noise robust model: sample selection approaches, noise robust loss functions, or label correction methods. In this paper, we propose PARS: Pseudo-Label Aware Robust Sample Selection, a hybrid approach that combines the best from all three worlds in a joint-training framework to achieve robustness to noisy labels. Specifically, PARS exploits all training samples using both the raw/noisy labels and estimated/refurbished pseudo-labels via self-training, divides samples into an ambiguous and a noisy subset via loss analysis, and designs label-dependent noise-aware loss functions for both sets of filtered labels. Results show that PARS significantly outperforms the state of the art on extensive studies on the noisy CIFAR-10 and CIFAR-100 datasets, particularly on challenging high-noise and low-resource settings. In particular, PARS achieved an absolute 12% improvement in test accuracy on the CIFAR-100 dataset with 90% symmetric label noise, and an absolute 27% improvement in test accuracy when only 1/5 of the noisy labels are available during training as an additional restriction. On a real-world noisy dataset, Clothing1M, PARS achieves competitive results to the state of the art.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

PARS: Pseudo-Label Aware Robust Sample Selection for Learning with Noisy Labels

Goel¹,

Jiao²,

Massiah³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…There have been several studies (Shu et al 2019;Tzeng et al 2017; to address the WSDA problem by training domain adaptation models with sample reweighting. For example, TCL (Shu et al 2019) selects clean and transferable source samples to train a neural network that has the same structure as DANN (Tzeng et al 2017);Butterfly (Liu et al 2019) picks clean samples from both the source domain and the target domain while sharing the shallow layers of two Co-teaching models (Han et al 2018) for domain adaptation. DCIC (Yu et al 2020) emphasizes clean and transferable source data to construct a denoising maximum mean discrepancy (Pan et al 2010) loss.…”

Section: Related Workmentioning

confidence: 99%

“…The intuition for exploring bilateral relationships in WSDA is briefly explained as follows. Similar to learning with label noise, existing WSDA methods would encounter the error accumulation issue: the error that comes from the biased selection of training instances in the previous iterations would be directly learnt again in the following training (Han et al 2018). In WSDA, the accumulated error from learning with source domain examples would be amplified, causing a significant increase in the target domain error Han et al 2020).…”

Section: The Proposed Gearnetmentioning

confidence: 99%

“…In WSDA, the accumulated error from learning with source domain examples would be amplified, causing a significant increase in the target domain error Han et al 2020). Co-teaching (Han et al 2018) andButterfly (Liu et al 2019) alleviate this issue by training two networks with different initialization to exchange the biased selections with each other. In this work, our method introduce additional model diversity derived from the distinct supervision information by training two networks with opposite transfer directions.…”

Section: The Proposed Gearnetmentioning

confidence: 99%

See 1 more Smart Citation

GearNet: Stepwise Dual Learning for Weakly Supervised Domain Adaptation

Xie¹,

Wei²,

Liu³

et al. 2022

Preprint

View full text Add to dashboard Cite

This paper studies a weakly supervised domain adaptation (WSDA) problem, where we only have access to the source domain with noisy labels, from which we need to transfer useful information to the unlabeled target domain. Although there have been a few studies on this problem, most of them only exploit unidirectional relationships from the source domain to the target domain. In this paper, we propose a universal paradigm called GearNet to exploit bilateral relationships between the two domains. Specifically, we take the two domains as different inputs to train two models alternately, and a symmetrical Kullback-Leibler loss is used for selectively matching the predictions of the two models in the same domain. This interactive learning schema enables implicit label noise canceling and exploit correlations between the source and target domains. Therefore, our GearNet has the great potential to boost the performance of a wide range of existing WSDA methods. Comprehensive experimental results show that the performance of existing methods can be significantly improved by equipping with our GearNet.

show abstract

Bayesian statistics‐guided label refurbishment mechanism: Mitigating label noise in medical image classification

et al. 2022

View full text Add to dashboard Cite

Purpose Deep neural networks (DNNs) have been widely applied in medical image classification, benefiting from its powerful mapping capability among medical images. However, these existing deep learning‐based methods depend on an enormous amount of carefully labeled images. Meanwhile, noise is inevitably introduced in the labeling process, degrading the performance of models. Hence, it is significant to devise robust training strategies to mitigate label noise in the medical image classification tasks. Methods In this work, we propose a novel Bayesian statistics‐guided label refurbishment mechanism (BLRM) for DNNs to prevent overfitting noisy images. BLRM utilizes maximum a posteriori probability in the Bayesian statistics and the exponentially time‐weighted technique to selectively correct the labels of noisy images. The training images are purified gradually with the training epochs when BLRM is activated, further improving classification performance. Results Comprehensive experiments on both synthetic noisy images (public OCT & Messidor datasets) and real‐world noisy images (ANIMAL‐10N) demonstrate that BLRM refurbishes the noisy labels selectively, curbing the adverse effects of noisy data. Also, the anti‐noise BLRMs integrated with DNNs are effective at different noise ratio and are independent of backbone DNN architectures. In addition, BLRM is superior to state‐of‐the‐art comparative methods of anti‐noise. Conclusions These investigations indicate that the proposed BLRM is well capable of mitigating label noise in medical image classification tasks.

show abstract

Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels

Cited by 55 publications

References 33 publications

PARS: Pseudo-Label Aware Robust Sample Selection for Learning with Noisy Labels

PARS: Pseudo-Label Aware Robust Sample Selection for Learning with Noisy Labels

GearNet: Stepwise Dual Learning for Weakly Supervised Domain Adaptation

Bayesian statistics‐guided label refurbishment mechanism: Mitigating label noise in medical image classification

Contact Info

Product

Resources

About