Heterogeneous data co-clustering has attracted more and more attention in recent years due to its high impact on various applications. While the co-clustering algorithms for two types of heterogeneous data (denoted by pair-wise co-clustering), such as documents and terms, have been well studied in the literature, the work on more types of heterogeneous data (denoted by high-order co-clustering) is still very limited. As an attempt in this direction, in this paper, we worked on a specific case of high-order coclustering in which there is a central type of objects that connects the other types so as to form a star structure of the interrelationships. Actually, this case could be a very good abstract for many real-world applications, such as the co-clustering of categories, documents and terms in text mining. In our philosophy, we treated such kind of problems as the fusion of multiple pairwise co-clustering sub-problems with the constraint of the star structure. Accordingly, we proposed the concept of consistent bipartite graph co-partitioning, and developed an algorithm based on semi-definite programming (SDP) for efficient computation of the clustering results. Experiments on toy problems and real data both verified the effectiveness of our proposed method.
The fuzzy c-means (FCM) algorithm is one of the most frequently used clustering algorithms. The weighting exponent m is a parameter that greatly influences the performance of the FCM. But there has been no theoretical basis for selecting the proper weighting exponent in the literature. In this paper, we develop a new theoretical approach to selecting the weighting exponent in the FCM. Based on this approach, we reveal the relation between the stability of the fixed points of the FCM and the data set itself. This relation provides the theoretical basis for selecting the weighting exponent in the FCM. The numerical experiments verify the effectiveness of our theoretical conclusion.
Image clustering, an important technology for image processing, has been actively researched for a long period of time. Especially in recent years, with the explosive growth of the Web, image clustering has even been a critical technology to help users digest the large amount of online visual information. However, as far as we know, many previous works on image clustering only used either low-level visual features or surrounding texts, but rarely exploited these two kinds of information in the same framework. To tackle this problem, we proposed a novel method named consistent bipartite graph co-partitioning in this paper, which can cluster Web images based on the consistent fusion of the information contained in both low-level features and surrounding texts. In particular, we formulated it as a constrained multiobjective optimization problem, which can be efficiently solved by semi-definite programming (SDP). Experiments on a realworld Web image collection showed that our proposed method outperformed the methods only based on low-level features or surround texts.
Abstract-An independent component analysis (ICA) based approach is presented for learning view-specific subspace representations of the face object from multiview face examples. ICA, its variants, namely independent subspace analysis (ISA) and topographic independent component analysis (TICA), take into account higher order statistics needed for object view characterization. In contrast, principal component analysis (PCA), which de-correlates the second order moments, can hardly reveal good features for characterizing different views, when the training data comprises a mixture of multiview examples and the learning is done in an unsupervised way with view-unlabeled data. We demonstrate that ICA, TICA, and ISA are able to learn view-specific basis components unsupervisedly from the mixture data. We investigate results learned by ISA in an unsupervised way closely and reveal some surprising findings and thereby explain underlying reasons for the emergent formation of view subspaces. Extensive experimental results are presented.Index Terms-Appearance-based approach, face analysis, independent component analysis (ICA), independent subspace analysis (ISA), learning by examples, topographic independent component analysis (TICA), view subspaces.
Sequential Monte Carlo methods, especially the particle filter (PF) and its various modifications, have been used effectively in dealing with stochastic dynamic systems. The standard PF samples the current state through the underlying state dynamics, then uses the current observation to evaluate the sample's importance weight. However, there is a set of problems in which the current observation provides significant information about the current state but the state dynamics are weak, and thus sampling using the current observation often produces more efficient samples than sampling using the state dynamics. In this article we propose a new variant of the PF, the independent particle filter (IPF), to deal with these problems. The IPF generates exchangeable samples of the current state from a sampling distribution that is conditionally independent of the previous states, a special case of which uses only the current observation. Each sample can then be matched with multiple samples of the previous states in evaluating the importance weight. We present some theoretical results showing that this strategy improves efficiency of estimation as well as reduces resampling frequency. We also discuss some extensions of the IPF, and use several synthetic examples to demonstrate the effectiveness of the method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.