Yangdi Lu scite author profile

Deep learning with noisy labels is challenging as deep neural networks have the high capacity to memorize the noisy labels. In this paper, we propose a learning algorithm called Co-matching, which balances the consistency and divergence between two networks by augmentation anchoring. Specifically, we have one network generate anchoring label from its prediction on a weakly-augmented image. Meanwhile, we force its peer network, taking the strongly-augmented version of the same image as input, to generate prediction close to the anchoring label. We then update two networks simultaneously by selecting small-loss instances to minimize both unsupervised matching loss (i.e., measure the consistency of the two networks) and supervised classification loss (i.e. measure the classification performance). Besides, the unsupervised matching loss makes our method not heavily rely on noisy labels, which prevents memorization of noisy labels. Experiments on three benchmark datasets demonstrate that Co-matching achieves results comparable to the state-of-the-art methods.

show abstract

Towards Update-Efficient and Parallel-Friendly Content-Based Indexing Scheme in Cloud Computing

Zhu

et al. 2018

Int. J. Semantic Computing

View full text Add to dashboard Cite

The sheer volume of contents generated by today’s Internet services is stored in the cloud. The effective indexing method is important to provide the content to users on demand. The indexing method associating the user-generated metadata with the content is vulnerable to the inaccuracy caused by the low quality of the metadata. While the content-based indexing does not depend on the error-prone metadata, the state-of-the-art research focuses on developing descriptive features and misses the system-oriented considerations when incorporating these features into the practical cloud computing systems. We propose an Update-Efficient and Parallel-Friendly content-based indexing system, called Partitioned Hash Forest (PHF). The PHF system incorporates the state-of-the-art content-based indexing models and multiple system-oriented optimizations. PHF contains an approximate content-based index and leverages the hierarchical memory system to support the high volume of updates. Additionally, the content-aware data partitioning and lock-free concurrency management module enable the parallel processing of the concurrent user requests. We evaluate PHF in terms of indexing accuracy and system efficiency by comparing it with the state-of-the-art content-based indexing algorithm and its variances. We achieve the significantly better accuracy with less resource consumption, around 37% faster in update processing and up to 2.5[Formula: see text] throughput speedup in a multi-core platform comparing to other parallel-friendly designs.

show abstract

An Ensemble Model for Combating Label Noise

Yang

2022

View full text Add to dashboard Cite

The labels crawled from web services (e.g. querying images from search engines and collecting tags from social media images) are often prone to noise, and the presence of such label noise degrades the classification performance of the resulting deep neural network (DNN) models. In this paper, we propose an ensemble model consisting of two networks to prevent the model from memorizing noisy labels. Within our model, we have one network generate an anchoring label from its prediction on a weakly-augmented image. Meanwhile, we force its peer network, taking the strongly-augmented version of the same image as input, to generate prediction close to the anchoring label for knowledge distillation. By observing the loss distribution, we use a mixture model to dynamically estimate the clean probability of each training sample and generate a confidence clean set. Then we train both networks simultaneously by the clean set to minimize our loss function which contains unsupervised matching loss (i.e., measure the consistency of the two networks) and supervised classification loss (i.e. measure the classification performance). We theoretically analyze the gradient of our loss function to show that it implicitly prevents memorization of the wrong labels. Experiments on two simulated benchmarks and one real-world dataset demonstrate that our approach achieves substantial improvements over the state-of-the-art methods.

show abstract

SELC: Self-Ensemble Label Correction Improves Learning with Noisy Labels

Lu¹,

He²

2022

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yangdi Lu

Few-Shot Learning of Video Action Recognition Only Based on Video Contents

Co-matching: Combating Noisy Labels by Augmentation Anchoring

Towards Update-Efficient and Parallel-Friendly Content-Based Indexing Scheme in Cloud Computing

An Ensemble Model for Combating Label Noise

SELC: Self-Ensemble Label Correction Improves Learning with Noisy Labels

Contact Info

Product

Resources

About