The density peaks clustering (DPC) is known as an excellent approach to detect some complicated-shaped clusters with high-dimensionality. However, it is not able to detect outliers, hub nodes and boundary nodes, or form low-density clusters. Therefore, halo is adopted to improve the performance of DPC in processing low-density nodes. This paper explores the potential reasons for adopting halos instead of low-density nodes, and proposes an improved recognition method on Halo node for Density Peak Clustering algorithm (HaloDPC). The proposed HaloDPC has improved the ability to deal with varying densities, irregular shapes, the number of clusters, outlier and hub node detection. This paper presents the advantages of the HaloDPC algorithm on several test cases.
The density peaks clustering (DPC) algorithm is not sensitive to the recognition of halo nodes. The halo nodes at the edge of the density peaks clustering algorithm has a lower local density. The outliers are distributed in halo nodes. The novel halo identification method based on density peaks clustering algorithm utilize the advantage of DBSCAN algorithm to quickly identify outliers, which improved the sensitivity to halo nodes. However, the identified halo nodes cannot be effectively assigned to adjacent clusters. Therefore, this paper will use K-nearest neighbor (KNN) algorithm to classify the identified halo nodes. K-nearest neighbor is the simplest and most efficient classification method. The KNN algorithm has the advantages of high accuracy, insensitivity to outliers and no input hypothesis data. Hence, we proposed a novel density peaks clustering halo node assignment algorithm based on K-nearest neighbor theory (KNN-HDPC). KNN-HDPC can grasp the internal relations between outliers and cluster nodes more deeply, so as to dig out the deeper relations between nodes. Experimental results demonstrate that the proposed algorithm can effectively cluster and reclassify a large number of complex data. We can quickly dig out the potential relationship between noise points and cluster points. The improved algorithm has higher clustering accuracy than the original DPC algorithm, and essentially has more robust clustering results. INDEX TERMS Density peaks clustering, halo node, K-nearest neighbor, low-density nodes.
In the real-world applications, heterogeneous interdependent attributes that consist of both discrete and numerical variables can be observed ubiquitously. The usual representation of these data sets is an information table, assuming the independence of attributes. However, very often, they are actually interdependent on one another, either explicitly or implicitly. Limited research has been conducted in analyzing such attribute interactions, which causes the analysis results to be more local than global. This paper proposes the coupled heterogeneous attribute analysis to capture the interdependence among mixed data by addressing coupling context and coupling weights in unsupervised learning. Such global couplings integrate the interactions within discrete attributes, within numerical attributes and across them to form the coupled representation for mixed type objects based on dimension conversion and feature selection. This work makes one step forward towards explicitly modeling the interdependence of heterogeneous attributes among mixed data, verified by the applications in data structure analysis, data clustering evaluation, and density comparison. Substantial experiments on 12 UCI data sets show that our approach can effectively capture the global couplings of heterogeneous attributes and outperforms the state-of-the-art methods, supported by statistical analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.