Self-Paced Robust Learning for Leveraging Clean Labels in Noisy Data

Zhang, Xuchao; Wang, Xian; Chen, Fanglan; Zhao, Liang; Lu, Chang-Tien

doi:10.1609/aaai.v34i04.6166

Cited by 14 publications

(4 citation statements)

References 22 publications

(20 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…SPL is a heuristic that dynamically creates a curriculum based on the losses of the model after each epoch, so as to incorporate the easier samples first. SPL and its variants have been observed to be resilient to noise both in theory (Meng et al, 2016) and practice (Jiang et al, 2018;Zhang et al, 2020), though prior works focus mostly on clean accuracy under unrealizeable label noise; in contrast, we measure targeted misclassification accuracy under adversarially selected, realizeable noise distributions.…”

Section: Related Workmentioning

confidence: 99%

Provable Guarantees against Data Poisoning Using Self-Expansion and Compatibility

Jin¹,

Sun²,

Rinard³

2021

Preprint

View full text Add to dashboard Cite

A recent line of work has shown that deep networks are highly susceptible to backdoor data poisoning attacks. Specifically, by injecting a small amount of malicious data into the training distribution, an adversary gains the ability to control the model's behavior during inference. In this work, we propose an iterative training procedure for removing poisoned data from the training set. Our approach consists of two steps. We first train an ensemble of weak learners to automatically discover distinct subpopulations in the training set. We then leverage a boosting framework to recover the clean data. Empirically, our method successfully defends against several state-of-theart backdoor attacks, including both clean and dirty label attacks. We also present results from an independent third-party evaluation including a recent adaptive poisoning adversary. The results indicate our approach is competitive with existing defenses against backdoor attacks on deep neural networks, and significantly outperforms the state-of-the-art in several scenarios.

show abstract

Section: Related Workmentioning

confidence: 99%

Provable Guarantees against Data Poisoning Using Self-Expansion and Compatibility

Jin¹,

Sun²,

Rinard³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Real-world data tend to be massive in quantity, but with quite a few unreliable noisy data that can lead to decreased generalization performance. Many studies have tried to address this, with some degree of success (Wu and Liu 2007;Zhai et al 2020;Zhang et al 2020). However, most of these studies only consider the impact of noisy data on accuracy, rather than on AUC.…”

Section: Introductionmentioning

confidence: 99%

“…Thus, SPL is an effective method for handling noisy data. Many experimental and theoretical analyses have proved its robustness (Meng, Zhao, and Jiang 2017;Liu, Ma, and Meng 2018;Zhang et al 2020). However, existing SPL methods are limited to pointwise learning, while AUC maximization is a pairwise learning problem.…”

Section: Introductionmentioning

confidence: 99%

Balanced Self-Paced Learning for AUC Maximization

Gu¹,

Zhang²,

Xiong³

et al. 2022

Preprint

View full text Add to dashboard Cite

Learning to improve AUC performance is an important topic in machine learning. However, AUC maximization algorithms may decrease generalization performance due to the noisy data. Self-paced learning is an effective method for handling noisy data. However, existing self-paced learning methods are limited to pointwise learning, while AUC maximization is a pairwise learning problem. To solve this challenging problem, we innovatively propose a balanced selfpaced AUC maximization algorithm (BSPAUC). Specifically, we first provide a statistical objective for self-paced AUC. Based on this, we propose our self-paced AUC maximization formulation, where a novel balanced self-paced regularization term is embedded to ensure that the selected positive and negative samples have proper proportions. Specially, the sub-problem with respect to all weight variables may be nonconvex in our formulation, while the one is normally convex in existing self-paced problems. To address this, we propose a doubly cyclic block coordinate descent method. More importantly, we prove that the sub-problem with respect to all weight variables converges to a stationary point on the basis of closed-form solutions, and our BSPAUC converges to a stationary point of our fixed optimization objective under a mild assumption. Considering both the deep learning and kernel-based implementations, experimental results on several large-scale datasets demonstrate that our BSPAUC has a better generalization performance than existing state-of-theart AUC maximization methods.

show abstract

“…Hence, deep learning suffers from noisy labels that are corrupted from ground-truth labels and are incorrectly labeled. Due to the increasing need to handle noisy-label problems in a massive dataset, learning with noisy labels (LNL) has received much attention in recent years [6,9,10,18,25,[34][35][36]40].…”

Section: Introductionmentioning

confidence: 99%

Over-Fit: Noisy-Label Detection based on the Overfitted Model Property

Park¹,

Song²,

Um³

et al. 2021

Preprint

View full text Add to dashboard Cite

Due to the increasing need to handle the noisy label problem in a massive dataset, learning with noisy labels has received much attention in recent years. As a promising approach, there have been recent studies to select clean training data by finding small-loss instances before a deep neural network overfits the noisy-label data. However, it is challenging to prevent overfitting. In this paper, we propose a novel noisy-label detection algorithm by employing the property of overfitting on individual data points. To this end, we present two novel criteria that statistically measure how much each training sample abnormally affects the model and clean validation data. Using the criteria, our iterative algorithm removes noisy-label samples and retrains the model alternately until no further performance improvement is made. In experiments on multiple benchmark datasets, we demonstrate the validity of our algorithm and show that our algorithm outperforms the state-of-the-art methods when the exact noise rates are not given. Furthermore, we show that our method can not only be expanded to a real-world video dataset but also can be viewed as a regularization method to solve problems caused by overfitting.

show abstract

Self-Paced Robust Learning for Leveraging Clean Labels in Noisy Data

Cited by 14 publications

References 22 publications

Provable Guarantees against Data Poisoning Using Self-Expansion and Compatibility

Provable Guarantees against Data Poisoning Using Self-Expansion and Compatibility

Balanced Self-Paced Learning for AUC Maximization

Over-Fit: Noisy-Label Detection based on the Overfitted Model Property

Contact Info

Product

Resources

About