2020
DOI: 10.1093/biomet/asaa011
|View full text |Cite
|
Sign up to set email alerts
|

Classification with imperfect training labels

Abstract: We study the effect of imperfect training data labels on the performance of classification methods. In a general setting, where the probability that an observation in the training dataset is mislabelled may depend on both the feature vector and the true label, we bound the excess risk of an arbitrary classifier trained with imperfect labels in terms of its excess risk for predicting a noisy label. This reveals conditions under which a classifier trained with imperfect labels remains consistent for classifying … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
21
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 32 publications
(24 citation statements)
references
References 42 publications
1
21
0
Order By: Relevance
“…For example, we train CNNs by associating all tiles within a slide with the same label, even though tumor slides will contain some regions that are non-tumor. Our approaches still work because classifiers can tolerate some error in the training data 52 . In the machine learning literature, this corresponds to the general problem of multi-label, multi-instance supervised learning with imbalanced data, an active area of research including for medical image data 28,[53][54][55] .…”
Section: Discussionmentioning
confidence: 99%
“…For example, we train CNNs by associating all tiles within a slide with the same label, even though tumor slides will contain some regions that are non-tumor. Our approaches still work because classifiers can tolerate some error in the training data 52 . In the machine learning literature, this corresponds to the general problem of multi-label, multi-instance supervised learning with imbalanced data, an active area of research including for medical image data 28,[53][54][55] .…”
Section: Discussionmentioning
confidence: 99%
“…There has been a fair amount of work recently on label noise (Frénay & Kabán, 2014; Frénay & Verleysen, 2014). Simple methods such as k ‐nearest neighbors and SVM have been shown to be robust to label noise (Cannings, Fan, & Samworth, 2019). It is less clear, however, how more sophisticated methods, such as those based on random projections, will be affected by noise.…”
Section: Discussionmentioning
confidence: 99%
“…Manwani and Sastry [16] discuss the noisetolerance property of risk minimization. Cannings, Fan and Samworth [5] show that LDA is consistent under the noise, and Blanco, Japón and Puerto [2] propose robust algorithms that apply relabeling and clustering to SVM.…”
Section: Prior Workmentioning
confidence: 99%
“…Finally, the general setting-where ρ(x, y) might vary with x-is studied by Cannings, Fan and Samworth [5]. In particular, they examine a setting for k-nearest neighbor where the corrupted labels Y i are more "clean" than the original labels Y i , in the sense that the corruption mechanism defined by ρ(x, y) acts to denoise labels near the decision boundary (i.e., η(x) ≈ 0.5) Specifically, suppose that, for values x with η(x) slightly higher than 0.5, we have ρ(x, +1) < ρ(x, −1) (that is, a label Y i = −1 that "should" instead be positive, has a greater chance of being flipped to Y i = +1), and similarly if η(x) is slightly lower than 0.5 then ρ(x, +1) > ρ(x, −1).…”
Section: Prior Workmentioning
confidence: 99%