2018
DOI: 10.48550/arxiv.1804.06872
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels

Abstract: Deep learning with noisy labels is practically challenging, as the capacity of deep models is so high that they can totally memorize these noisy labels sooner or later during training. Nonetheless, recent studies on the memorization effects of deep neural networks show that they would first memorize training data of clean labels and then those of noisy labels. Therefore in this paper, we propose a new deep learning paradigm called "Co-teaching" for combating with noisy labels. Namely, we train two deep neural … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
100
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 55 publications
(104 citation statements)
references
References 33 publications
1
100
0
Order By: Relevance
“…Another family of methods focus primarily on sample selection, where the model selects small-loss samples as "clean" samples under the assumption that the model first fits to the clean samples before memorizing the noisy samples (also known as the early-learning assumption) (Arpit et al, 2017;Zhang et al, 2016;. Han et al (2018) andYu et al (2019) proposed Co-teaching where sample selection is conducted using two networks to select clean and noisy samples, and then the clean samples are used for further training. MentorNet (Jiang et al, 2018) is a student-teacher framework where a pre-trained teacher network guides the learning of the student network with clean samples (whose labels are deemed "correct").…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Another family of methods focus primarily on sample selection, where the model selects small-loss samples as "clean" samples under the assumption that the model first fits to the clean samples before memorizing the noisy samples (also known as the early-learning assumption) (Arpit et al, 2017;Zhang et al, 2016;. Han et al (2018) andYu et al (2019) proposed Co-teaching where sample selection is conducted using two networks to select clean and noisy samples, and then the clean samples are used for further training. MentorNet (Jiang et al, 2018) is a student-teacher framework where a pre-trained teacher network guides the learning of the student network with clean samples (whose labels are deemed "correct").…”
Section: Related Workmentioning
confidence: 99%
“…First, several noise robust loss functions (Ghosh et al, 2017;Wang et al, 2019a;Zhang & Sabuncu, 2018) were proposed that are inherently tolerant to label noise. Second, sample selection methods (also referred to as loss correction in some literature) (Han et al, 2018;Yu et al, 2019;Arazo et al, 2019) are a popular technique that analyzes the per-sample loss distribution and separates the clean and noisy samples. The identified noisy samples are then re-weighted so that they contribute less in the loss computation.…”
Section: Introductionmentioning
confidence: 99%
“…There have been several studies (Shu et al 2019;Tzeng et al 2017; to address the WSDA problem by training domain adaptation models with sample reweighting. For example, TCL (Shu et al 2019) selects clean and transferable source samples to train a neural network that has the same structure as DANN (Tzeng et al 2017);Butterfly (Liu et al 2019) picks clean samples from both the source domain and the target domain while sharing the shallow layers of two Co-teaching models (Han et al 2018) for domain adaptation. DCIC (Yu et al 2020) emphasizes clean and transferable source data to construct a denoising maximum mean discrepancy (Pan et al 2010) loss.…”
Section: Related Workmentioning
confidence: 99%
“…The intuition for exploring bilateral relationships in WSDA is briefly explained as follows. Similar to learning with label noise, existing WSDA methods would encounter the error accumulation issue: the error that comes from the biased selection of training instances in the previous iterations would be directly learnt again in the following training (Han et al 2018). In WSDA, the accumulated error from learning with source domain examples would be amplified, causing a significant increase in the target domain error Han et al 2020).…”
Section: The Proposed Gearnetmentioning
confidence: 99%
See 1 more Smart Citation