Processing applications with a large number of dimensions has been a challenge lo the KDD community. Feature selection. an effective dimensionality reduction technique, is an essential pre-processing method to remove noisy features. In rhe literature there are only a few methods pmposed for feature selection for clustering. And, almost all of rhose methods are 'wrapper' techniques that require a clustering algorithm to evaluate the candidate feature subsets. The wrapper approach is largely unsuitable in real-world applications due to its heavy reliance on clustering algarirhms that require parameters such as number of clusters. and due ro lack of suitable clusrering criteria to evaluate clusrering in different subspaces. I n this paper we propose a %Iter' method that is independent of any clusrering algorithm. The proposed method is based on the observation that data with clusters has v e v different point-to-point distance histogram than that of data without clusters. Using this we propose an entropy measure thar is low ifdata has disrinct clusters and high otherwise. The entropy measure is suitable for selecting the most important subset of features because it is invariant with number of dimensions, and is affected only by the quality of clustering. Extensive performance evaluation over synthetic, benchmark, and real datasets shows its effectiveness.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.