AbstractClustering is an attractive technique used in many fields in order to deal with large scale data. Many clustering algorithms have been proposed so far. The most popular algorithms include density-based approaches. These kinds of algorithms can identify clusters of arbitrary shapes in datasets. The most common of them is the Density-Based Spatial Clustering of Applications with Noise (DBSCAN). The original DBSCAN algorithm has been widely applied in various applications and has many different modifications. However, there is a fundamental issue of the right choice of its two input parameters, i.e the eps radius and the MinPts density threshold. The choice of these parameters is especially difficult when the density variation within clusters is significant. In this paper, a new method that determines the right values of the parameters for different kinds of clusters is proposed. This method uses detection of sharp distance increases generated by a function which computes a distance between each element of a dataset and its k-th nearest neighbor. Experimental results have been obtained for several different datasets and they confirm a very good performance of the newly proposed method.
In this paper, a new cluster validity index which can be considered as a measure of the accuracy of the partitioning of data sets is proposed. The new index, called the STR index, is defined as the product of two components which determine changes of compactness and separability of clusters during a clustering process. The maximum value of this index identifies the best clustering scheme. Three popular algorithms have been applied as underlying clustering techniques, namely complete-linkage, expectation maximization and K-means algorithms. The performance of the new index is demonstrated for several artificial and real-life data sets. Moreover, this new index has been compared with other well-known indices, i.e., Dunn, Davies-Bouldin, PBM and Silhouette indices, taking into account the number of clusters in a data set as the comparison criterion. The results prove superiority of the new index as compared to the above-mentioned indices.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.