We propose two new algorithms for clustering graphs and networks. The first, called K‑algorithm, is derived directly from the k-means algorithm. It applies similar iterative local optimization but without the need to calculate the means. It inherits the properties of k-means clustering in terms of both good local optimization capability and the tendency to get stuck at a local optimum. The second algorithm, called the M-algorithm, gradually improves on the results of the K-algorithm to find new and potentially better local optima. It repeatedly merges and splits random clusters and tunes the results with the K-algorithm. Both algorithms are general in the sense that they can be used with different cost functions. We consider the conductance cost function and also introduce two new cost functions, called inverse internal weight and mean internal weight. According to our experiments, the M-algorithm outperforms eight other state-of-the-art methods. We also perform a case study by analyzing clustering results of a disease co-occurrence network, which demonstrate the usefulness of the algorithms in an important real-life application.
Although many fast methods exist for constructing a
k
NN-graph for low-dimensional data, it is still an open question how to do it efficiently for high-dimensional data. We present a new method to construct an approximate
k
NN-graph for medium- to high-dimensional data. Our method uses one-dimensional mapping with a Z-order curve to construct an initial graph and then continues to improve this using neighborhood propagation. Experiments show that the method is faster than the compared methods with five different benchmark datasets, the dimensionality of which ranges from 14 to 784. Compared to a brute-force approach, the method provides a speedup between 12.7:1 and 414.2:1 depending on the dataset. We also show that errors in the approximate
k
NN-graph originate more likely from outlier points; and, it can be detected during runtime, which points are likely to have errors in their neighbors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.