Abstract-Clustering is a common unsupervised learning technique used to discover group structure in a set of data. While there exist many algorithms for clustering, the important issue of feature selection, that is, what attributes of the data should be used by the clustering algorithms, is rarely touched upon. Feature selection for clustering is difficult because, unlike in supervised learning, there are no class labels for the data and, thus, no obvious criteria to guide the search. Another important problem in clustering is the determination of the number of clusters, which clearly impacts and is influenced by the feature selection issue. In this paper, we propose the concept of feature saliency and introduce an expectation-maximization (EM) algorithm to estimate it, in the context of mixture-based clustering. Due to the introduction of a minimum message length model selection criterion, the saliency of irrelevant features is driven toward zero, which corresponds to performing feature selection. The criterion and algorithm are then extended to simultaneously estimate the feature saliencies and the number of clusters.
Abstract. This paper proposes a novel curvilinear structure detector, called Optimally Oriented Flux (OOF). OOF finds an optimal axis on which image gradients are projected in order to compute the image gradient flux. The computation of OOF is localized at the boundaries of local spherical regions. It avoids considering closely located adjacent structures. The main advantage of OOF is its robustness against the disturbance induced by closely located adjacent objects. Moreover, the analytical formulation of OOF introduces no additional computation load as compared to the calculation of the Hessian matrix which is widely used for curvilinear structure detection. It is experimentally demonstrated that OOF delivers accurate and stable curvilinear structure detection responses under the interference of closely located adjacent structures as well as image noise.
Understanding the structure of multidimensional patterns, especially in unsupervised cases, is of fundamental importance in data mining, pattern recognition, and machine learning. Several algorithms have been proposed to analyze the structure of high-dimensional data based on the notion of manifold learning. These algorithms have been used to extract the intrinsic characteristics of different types of high-dimensional data by performing nonlinear dimensionality reduction. Most of these algorithms operate in a "batch" mode and cannot be efficiently applied when data are collected sequentially. In this paper, we describe an incremental version of ISOMAP, one of the key manifold learning algorithms. Our experiments on synthetic data as well as real world images demonstrate that our modified algorithm can maintain an accurate low-dimensional representation of the data in an efficient manner.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.