Community detection is an important task in network analysis. A community (also referred to as a cluster) is a set of cohesive vertices that have more connections inside the set than outside. In many social and information networks, these communities naturally overlap. For instance, in a social network, each vertex in a graph corresponds to an individual who usually participates in multiple communities. One of the most successful techniques for finding overlapping communities is based on local optimization and expansion of a community metric around a seed set of vertices. In this paper, we propose an efficient overlapping community detection algorithm using a seed set expansion approach. In particular, we develop new seeding strategies for a personalized PageRank scheme that optimizes the conductance community score. The key idea of our algorithm is to find good seeds, and then expand these seed sets using the personalized PageRank clustering procedure. Experimental results show that this seed set expansion approach outperforms other state-of-the-art overlapping community detection methods. We also show that our new seeding strategies are better than previous strategies, and are thus effective in finding good overlapping clusters in a graph.
Community detection is an important task in network analysis. A community (also referred to as a cluster) is a set of cohesive vertices that have more connections inside the set than outside. In many social and information networks, these communities naturally overlap. For instance, in a social network, each vertex in a graph corresponds to an individual who usually participates in multiple communities. In this paper, we propose an efficient overlapping community detection algorithm using a seed expansion approach. The key idea of our algorithm is to find good seeds, and then greedily expand these seeds based on a community metric. Within this seed expansion method, we investigate the problem of how to determine good seed nodes in a graph. In particular, we develop new seeding strategies for a personalized PageRank clustering scheme that optimizes the conductance community score. Experimental results show that our seed expansion algorithm outperforms other state-of-the-art overlapping community detection methods in terms of producing cohesive clusters and identifying ground-truth communities. We also show that our new seeding strategies are better than existing strategies, and are thus effective in finding good overlapping communities in real-world networks.
We consider the general k-way clustering problem in signed social networks where relationships between entities can be either positive or negative. Motivated by social balance theory, the clustering problem in signed networks aims to find mutually antagonistic groups such that entities within the same group are friends with each other. A recent method proposed in [13] extended the spectral clustering algorithm to the signed network setting by considering the signed graph Laplacian. This has been shown to be equivalent to finding clusters that minimize the 2-way signed ratio cut. In this paper, we show that there is a fundamental weakness when we directly extend the signed Laplacian to the k-way clustering problem. To overcome this weakness, we formulate new k-way objectives for signed networks. In particular, we propose a criterion that is analogous to the normalized cut, called balance normalized cut, which is not only theoretically sound but also experimentally effective in k-way clustering. In addition, we prove that these objectives are equivalent to weighted kernel k-means objectives by choosing an appropriate kernel matrix. Employing this equivalence, we develop a multilevel clustering framework for signed networks. In this framework, we coarsen the graph level by level and refine the clustering results at each level via a k-means based algorithm so that the signed clustering objectives are optimized. This approach gives good quality clustering results, and is also highly efficient and scalable. In experiments, we see that our multilevel approach is competitive to other state-of-the-art methods, while it is much faster and more scalable. In particular, the largest graph we have considered in our experiments contains 1 million nodes and 100 million edges -this graph can be clustered in less than four hundred seconds using our algorithm.
Traditional clustering algorithms, such as k-means, output a clustering that is disjoint and exhaustive, that is, every single data point is assigned to exactly one cluster. However, in real datasets, clusters can overlap and there are often outliers that do not belong to any cluster. This is a well recognized problem that has received much attention in the past, and several algorithms, such as fuzzy kmeans have been proposed for overlapping clustering. However, most existing algorithms address either overlap or outlier detection and do not tackle the problem in a unified way. In this paper, we propose a simple and intuitive objective function that captures the issues of overlap and nonexhaustiveness in a unified manner. Our objective function can be viewed as a reformulation of the traditional k-means objective, with easy-to-understand parameters that capture the degrees of overlap and non-exhaustiveness. By studying the objective, we are able to obtain a simple iterative algorithm which we call NEO-K-Means (Non-Exhaustive Overlapping K-Means). Furthermore, by considering an extension to weighted kernel k-means, we can tackle the case of non-exhaustive and overlapping graph clustering. This extension allows us to apply our NEO-K-Means algorithm to the community detection problem, which is an important task in network analysis. Our experimental results show that the new objective and algorithm are effective in finding ground-truth clusterings that have varied overlap and non-exhaustiveness; for the case of graphs, we show that our algorithm outperforms state-of-the-art overlapping community detection methods.
Abstract-Clustering of social networks is an important task for their analysis; however, most existing algorithms do not scale to the massive size of today's social networks. A popular class of graph clustering algorithms for large-scale networks, such as PMetis, KMetis and Graclus, is based on a multilevel framework. Generally, these multilevel algorithms work reasonably well on networks with a few million vertices. However, when the network size increases to the scale of 10 million vertices or greater, the performance of these algorithms rapidly degrades. Furthermore, an inherent property of social networks, the power law degree distribution, makes these algorithms infeasible to apply to large-scale social networks. In this paper, we propose a scalable and memory-efficient clustering algorithm for large-scale social networks. We name our algorithm GEM, by mixing two key concepts of the algorithm, Graph Extraction and weighted kernel k-Means. GEM efficiently extracts a good skeleton graph from the original graph, and propagates the clustering result of the extracted graph to the rest of the network. Experimental results show that GEM produces clusters of quality comparable to or better than existing state-of-the-art graph clustering algorithms, while it is much faster and consumes much less memory. Furthermore, the parallel implementation of GEM, called PGEM, not only produces higher quality of clusters but also achieves much better scalability than most current parallel graph clustering algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.