2018
DOI: 10.1109/tnnls.2018.2792062
|View full text |Cite
|
Sign up to set email alerts
|

Progressive Stochastic Learning for Noisy Labels

Abstract: Large-scale learning problems require a plethora of labels that can be efficiently collected from crowdsourcing services at low cost. However, labels annotated by crowdsourced workers are often noisy, which inevitably degrades the performance of large-scale optimizations including the prevalent stochastic gradient descent (SGD). Specifically, these noisy labels adversely affect updates of the primal variable in conventional SGD. To solve this challenge, we propose a robust SGD mechanism called progressive stoc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
159
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 141 publications
(263 citation statements)
references
References 30 publications
0
159
0
Order By: Relevance
“…Two types of loss corrections are proposed: forward and backward. Forward loss multiplies model predictions with noise transition matrix to match them with noisy labels while backward loss multiplies Noise Model Based Methods 1.Noisy Channel a.Extra layer: Linear fully connected layer [42][43], softmax layer [44] b.Separate Network: Estimating noise type [45], masking [46], quality embedding [47] c.Explicit Calculation: EM [26], [48], [49], conditional independence [50], forward&backward loss [51], unsupervised generative model [52], bayesian form [53] 2.Label Noise Cleansing a.Using Reference Set: train cleaner on reference set [54], [55], clean based on extracted features [56], teacher cleans for student [57], ensemble of networks [58] b.Not Using Reference Set: Moving average of network predictions [59], consistency loss [60], ensemble of network [61], prototypes [62], random split [63], confidence policy [64] 3.Sample Choosing a.Self Consistency: Consistency with model [65], consistency with moving average of model [66], graph-based [67], dividing to two subset [68] b.Curriculum Learning: Screening loss [69], teacher-student [70], selecting uncertain samples [71], extra layer with similarity matrix [72], curriculum loss [73], data complexity [74], partial labels [75] c.Multiple Classifiers: Consistency of networks [76], co-teaching [77]- [79] d.Active Learning: Relabel hard ...…”
Section: A Noisy Channelmentioning
confidence: 99%
“…Two types of loss corrections are proposed: forward and backward. Forward loss multiplies model predictions with noise transition matrix to match them with noisy labels while backward loss multiplies Noise Model Based Methods 1.Noisy Channel a.Extra layer: Linear fully connected layer [42][43], softmax layer [44] b.Separate Network: Estimating noise type [45], masking [46], quality embedding [47] c.Explicit Calculation: EM [26], [48], [49], conditional independence [50], forward&backward loss [51], unsupervised generative model [52], bayesian form [53] 2.Label Noise Cleansing a.Using Reference Set: train cleaner on reference set [54], [55], clean based on extracted features [56], teacher cleans for student [57], ensemble of networks [58] b.Not Using Reference Set: Moving average of network predictions [59], consistency loss [60], ensemble of network [61], prototypes [62], random split [63], confidence policy [64] 3.Sample Choosing a.Self Consistency: Consistency with model [65], consistency with moving average of model [66], graph-based [67], dividing to two subset [68] b.Curriculum Learning: Screening loss [69], teacher-student [70], selecting uncertain samples [71], extra layer with similarity matrix [72], curriculum loss [73], data complexity [74], partial labels [75] c.Multiple Classifiers: Consistency of networks [76], co-teaching [77]- [79] d.Active Learning: Relabel hard ...…”
Section: A Noisy Channelmentioning
confidence: 99%
“…All results are shown in Table 1. Our loss modi cation ("LM") method is compared to a partial label (PL) method [6], a multi-label (ML) method [27], and "PC/S" (the pairwise-comparison formulation with sigmoid loss), which achieved the best performance in [12]. We can see, "PC/S" achieves very good performances.…”
Section: Uci and Uspsmentioning
confidence: 99%
“…Mixture proportion estimation has been an important ingredient for learning with label noise [31,19,36,5,39,13,12], learning with complementary labels [38], domain adaptation [11,41], semi-supervised learning [29], anomaly rejection [2,30], PU learning [8,23], and multiple instance learning [1], etc. Here, we give a brief summary of the former two applications, and show how the proposed method efficiently solves these problems.…”
Section: Applicationsmentioning
confidence: 99%