Existing subspace clustering methods typically employ shallow models to estimate underlying subspaces of unlabeled data points and cluster them into corresponding groups. However, due to the limited representative capacity of the employed shallow models, those methods may fail in handling realistic data without the linear subspace structure. To address this issue, we propose a novel subspace clustering approach by introducing a new deep model-Structured AutoEncoder (StructAE). The StructAE learns a set of explicit transformations to progressively map input data points into nonlinear latent spaces while preserving the local and global subspace structure. In particular, to preserve local structure, the StructAE learns representations for each data point by minimizing reconstruction error w.r.t. itself. To preserve global structure, the StructAE incorporates a prior structured information by encouraging the learned representation to preserve specified reconstruction patterns over the entire data set. To the best of our knowledge, StructAE is one of first deep subspace clustering approaches. Extensive experiments show that the proposed StructAE significantly outperforms 15 state-of-the-art subspace clustering approaches in terms of five evaluation metrics.
Abstract-Under the framework of spectral clustering, the key of subspace clustering is building a similarity graph which describes the neighborhood relations among data points. Some recent works build the graph using sparse, low-rank, and ℓ2-norm-based representation, and have achieved state-of-the-art performance. However, these methods have suffered from the following two limitations. First, the time complexities of these methods are at least proportional to the cube of the data size, which make those methods inefficient for solving large-scale problems. Second, they cannot cope with out-of-sample data that are not used to construct the similarity graph. To cluster each out-of-sample datum, the methods have to recalculate the similarity graph and the cluster membership of the whole data set. In this paper, we propose a unified framework which makes representation-based subspace clustering algorithms feasible to cluster both out-of-sample and large-scale data. Under our framework, the large-scale problem is tackled by converting it as out-ofsample problem in the manner of "sampling, clustering, coding, and classifying". Furthermore, we give an estimation for the error bounds by treating each subspace as a point in a hyperspace. Extensive experimental results on various benchmark data sets show that our methods outperform several recently-proposed scalable methods in clustering large-scale data set.Index Terms-Scalable subspace clustering, out-of-sample problem, sparse subspace clustering, low-rank representation, least square regression, error bound analysis.
Recently, low-rank representation (LRR) has shown promising performance in many real-world applications such as face clustering. However, LRR may not achieve satisfactory results when dealing with the data from nonlinear subspaces, since it is originally designed to handle the data from linear subspaces in the input space. Meanwhile, the kernel-based methods deal with the nonlinear data by mapping it from the original input space to a new feature space through a kernel-induced mapping. To effectively cope with the nonlinear data, we first propose the kernelized version of LRR in the clean data case. We also present a closed-form solution for the resultant optimization problem. Moreover, to handle corrupted data, we propose the robust kernel LRR (RKLRR) approach, and develop an efficient optimization algorithm to solve it based on the alternating direction method. In particular, we show that both the subproblems in our optimization algorithm can be efficiently and exactly solved, and it is guaranteed to obtain a globally optimal solution. Besides, our proposed algorithm can also solve the original LRR problem, which is a special case of our RKLRR when using the linear kernel. In addition, based on our new optimization technique, the kernelization of some variants of LRR can be similarly achieved. Comprehensive experiments on synthetic data sets and real-world data sets clearly demonstrate the efficiency of our algorithm, as well as the effectiveness of RKLRR and the kernelization of two variants of LRR.
We study in this paper the problem of learning classifiers from ambiguously labeled images. For
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.