Deep learning with noisy labels is challenging as deep neural networks have the high capacity to memorize the noisy labels. In this paper, we propose a learning algorithm called Co-matching, which balances the consistency and divergence between two networks by augmentation anchoring. Specifically, we have one network generate anchoring label from its prediction on a weakly-augmented image. Meanwhile, we force its peer network, taking the strongly-augmented version of the same image as input, to generate prediction close to the anchoring label. We then update two networks simultaneously by selecting small-loss instances to minimize both unsupervised matching loss (i.e., measure the consistency of the two networks) and supervised classification loss (i.e. measure the classification performance). Besides, the unsupervised matching loss makes our method not heavily rely on noisy labels, which prevents memorization of noisy labels. Experiments on three benchmark datasets demonstrate that Co-matching achieves results comparable to the state-of-the-art methods.
The sheer volume of contents generated by today’s Internet services is stored in the cloud. The effective indexing method is important to provide the content to users on demand. The indexing method associating the user-generated metadata with the content is vulnerable to the inaccuracy caused by the low quality of the metadata. While the content-based indexing does not depend on the error-prone metadata, the state-of-the-art research focuses on developing descriptive features and misses the system-oriented considerations when incorporating these features into the practical cloud computing systems. We propose an Update-Efficient and Parallel-Friendly content-based indexing system, called Partitioned Hash Forest (PHF). The PHF system incorporates the state-of-the-art content-based indexing models and multiple system-oriented optimizations. PHF contains an approximate content-based index and leverages the hierarchical memory system to support the high volume of updates. Additionally, the content-aware data partitioning and lock-free concurrency management module enable the parallel processing of the concurrent user requests. We evaluate PHF in terms of indexing accuracy and system efficiency by comparing it with the state-of-the-art content-based indexing algorithm and its variances. We achieve the significantly better accuracy with less resource consumption, around 37% faster in update processing and up to 2.5[Formula: see text] throughput speedup in a multi-core platform comparing to other parallel-friendly designs.
The labels crawled from web services (e.g. querying images from search engines and collecting tags from social media images) are often prone to noise, and the presence of such label noise degrades the classification performance of the resulting deep neural network (DNN) models. In this paper, we propose an ensemble model consisting of two networks to prevent the model from memorizing noisy labels. Within our model, we have one network generate an anchoring label from its prediction on a weakly-augmented image. Meanwhile, we force its peer network, taking the strongly-augmented version of the same image as input, to generate prediction close to the anchoring label for knowledge distillation. By observing the loss distribution, we use a mixture model to dynamically estimate the clean probability of each training sample and generate a confidence clean set. Then we train both networks simultaneously by the clean set to minimize our loss function which contains unsupervised matching loss (i.e., measure the consistency of the two networks) and supervised classification loss (i.e. measure the classification performance). We theoretically analyze the gradient of our loss function to show that it implicitly prevents memorization of the wrong labels. Experiments on two simulated benchmarks and one real-world dataset demonstrate that our approach achieves substantial improvements over the state-of-the-art methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.