In this paper we focus on realistic clustering problems where the input data is high-dimensional and the clusters have complex, multimodal distribution. In this challenging setting the conventional methods, such as k-centers family, hierarchical clustering or those based on model fitting, are inefficient and typically converge far from the globally optimal solution. As an alternative, we propose a novel unsupervised learning approach which is based on the compressive sensing paradigm. The key idea underlying our algorithm is to monitor the distance between the test sample and its principal projection in each cluster, and continue re-assigning it to the cluster yielding the smallest residual. As a result, we obtain an iterative procedure which, under the compressive assumptions, minimizes the total reconstruction error of all samples from their nearest clusters. To evaluate the proposed approach, we have conducted a series of experiments involving various image collections where the task was to automatically group similar objects. Comparison of the obtained results with those yielded by the state-of-the-art clustering methods provides evidence for high discriminative power of our algorithm.
International Conference on Machine Learning and Applications (ICMLA)This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved. Abstract-In this paper we focus on realistic clustering problems where the input data is high-dimensional and the clusters have complex, multimodal distribution. In this challenging setting the conventional methods, such as k-centers family, hierarchical clustering or those based on model fitting, are inefficient and typically converge far from the globally optimal solution. As an alternative, we propose a novel unsupervised learning approach which is based on the compressive sensing paradigm. The key idea underlying our algorithm is to monitor the distance between the test sample and its principal projection in each cluster, and continue re-assigning it to the cluster yielding the smallest residual. As a result, we obtain an iterative procedure which, under the compressive assumptions, minimizes the total reconstruction error of all samples from their nearest clusters. To evaluate the proposed approach, we have conducted a series of experiments involving various image collections where the task was to automatically group similar objects. Comparison of the obtained results with thos...