Domain generalization aims to apply knowledge gained from multiple labeled source domains to unseen target domains. The main difficulty comes from the dataset bias: training data and test data have different distributions, and the training set contains heterogeneous samples from different distributions. Let X denote the features, and Y be the class labels. Existing domain generalization methods address the dataset bias problem by learning a domain-invariant representation h(X) that has the same marginal distribution P(h(X)) across multiple source domains. The functional relationship encoded in P(Y |X) is usually assumed to be stable across domains such that P(Y |h(X)) is also invariant. However, it is unclear whether this assumption holds in practical problems. In this paper, we consider the general situation where both P(X) and P(Y |X) can change across all domains. We propose to learn a feature representation which has domain-invariant class conditional distributions P(h(X)|Y ). With the conditional invariant representation, the invariance of the joint distribution P(h(X), Y ) can be guaranteed if the class prior P(Y ) does not change across training and test domains. Extensive experiments on both synthetic and real data demonstrate the effectiveness of the proposed method.
Abstract-In this paper, we study a classification problem in which sample labels are randomly corrupted. In this scenario, there is an unobservable sample with noise-free labels. However, before being observed, the true labels are independently flipped with a probability ρ ∈ [0, 0.5), and the random label noise can be class-conditional. Here, we address two fundamental problems raised by this scenario. The first is how to best use the abundant surrogate loss functions designed for the traditional classification problem when there is label noise. We prove that any surrogate loss function can be used for classification with noisy labels by using importance reweighting, with consistency assurance that the label noise does not ultimately hinder the search for the optimal classifier of the noise-free sample. The other is the open problem of how to obtain the noise rate ρ. We show that the rate is upper bounded by the conditional probability P (Ŷ |X) of the noisy sample. Consequently, the rate can be estimated, because the upper bound can be easily reached in classification problems. Experimental results on synthetic and real datasets confirm the efficiency of our methods.
Multiple kernel clustering (MKC) algorithms optimally combine a group of pre-specified base kernels to improve clustering performance. However, existing MKC algorithms cannot efficiently address the situation where some rows and columns of base kernels are absent. This paper proposes a simple while effective algorithm to address this issue. Different from existing approaches where incomplete kernels are firstly imputed and a standard MKC algorithm is applied to the imputed kernels, our algorithm integrates imputation and clustering into a unified learning procedure. Specifically, we perform multiple kernel clustering directly with the presence of incomplete kernels, which are treated as auxiliary variables to be jointly optimized. Our algorithm does not require that there be at least one complete base kernel over all the samples. Also, it adaptively imputes incomplete kernels and combines them to best serve clustering. A three-step iterative algorithm with proved convergence is designed to solve the resultant optimization problem. Extensive experiments are conducted on four benchmark data sets to compare the proposed algorithm with existing imputation-based methods. Our algorithm consistently achieves superior performance and the improvement becomes more significant with increasing missing ratio, verifying the effectiveness and advantages of the proposed joint imputation and clustering.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.