Cluster validity index plays an important role in assessing the quality of clustering results. However, most of the existing validity indices take a trial-and-error strategy, and their correctness depend on not only the measurements of intra-and inter-cluster distances but also the specific clustering algorithms and data structures. Consequently, the applications of these indices are limited in practice. In this paper, we firstly define the total surface area and volume of all clusters in a 2-dimensinal data space, thereby recovering their natural interrelation among various numbers of clusters. On this basis, a novel validity index is proposed to directly assess the clustering results of any dataset, which does not require any trail-and-error process, clustering algorithms, data structures, or the measurements of intra-and inter-cluster distances. In the case of a high-dimensional data space, all clusters are transformed into spherical clusters of normalized size in a 2-dimensinal data space through a multidimensional scaling transformation. Two groups of typical synthetic datasets and real datasets with various characteristics are used to validate the novel validity index.INDEX TERMS Cluster validity index, multidimensional scaling transformation, volume and surface area.
The evaluation on clustering results is an important component of clustering analysis, which can be conducted by the cluster validity index. However, the performances of most existing indices depend on not only the specific clustering algorithms but also the measurements of within-and between-cluster distances and data structures, resulting in limited applications in practice. In this paper, a new within-cluster distance under a general assumption is defined first. After adjusting within-cluster distances of each point according to the adjustment rule, a novel cluster validity index is proposed. Moreover, the notion of chain is introduced to eliminate the effects of sizes, densities, and shapes of clusters. This index does not need any prior information about clustering algorithms and is independent of data structures. Two groups of synthetic datasets with various characteristics and real-world datasets are used to validate this proposed validity index. Experimental results demonstrate that the evaluation accuracy of this index is higher than that of the existing typical indices and performs well on datasets with irregular-shaped clusters. INDEX TERMS Cluster validity index, within-cluster distance (WD), between-cluster distance (BD).
The evaluation of clustering results plays an important role in clustering analysis. However, the existing validity indices are limited to a specific clustering algorithm, clustering parameter, and assumption in practice. In this paper, we propose a novel validity index to solve the above problems based on two complementary measures: boundary points matching and interior points connectivity. Firstly, when any clustering algorithm is performed on a dataset, we extract all boundary points for the dataset and its partitioned clusters using a nonparametric metric. The measure of boundary points matching is computed. Secondly, the interior points connectivity of both the dataset and all the partitioned clusters are measured. The proposed validity index can evaluate different clustering results on the dataset obtained from different clustering algorithms, which cannot be evaluated by the existing validity indices at all. Experimental results demonstrate that the proposed validity index can evaluate clustering results obtained by using an arbitrary clustering algorithm and find the optimal clustering parameters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.