An important component of current research in big data is graph analytics on very large graphs. Of the many problems of interest in this domain, graph pattern matching is both challenging and practically important. The problem is, given a relatively small query graph, finding matching patterns in a large data graph. Algorithms to address this problem are used in large social networks and graph databases. Though fast querying is highly desirable, the scalability of pattern matching algorithms is hindered by the NP-completeness of the subgraph isomorphism problem. This paper presents a conceptually simple, memory-efficient, pruning-based algorithm for the subgraph isomorphism problem that outperforms commonly used algorithms on large graphs. The high performance is due in large part to the effectiveness of the pruning algorithm, which in many cases removes a large percentage of the vertices not found in isomorphic matches. In this paper, the runtime of the algorithm is tested alongside other algorithms on graphs of up to 10 million vertices and 250 million edges.
Abstract-Community detection has become an extremely active area of research in recent years, with researchers proposing various new metrics and algorithms to address the problem. Recently, the Weighted Community Clustering (WCC) metric was proposed as a novel way to judge the quality of a community partitioning based on the distribution of triangles in the graph, and was demonstrated to yield superior results over other commonly used metrics like modularity. The same authors later presented a parallel algorithm for optimizing WCC on large graphs. In this paper, we propose a new distributed, vertexcentric algorithm for community detection using the WCC metric. Results are presented that demonstrate the algorithm's performance and scalability on up to 32 worker machines and real graphs of up to 1.8 billion vertices. The algorithm scales best with the largest graphs, and to our knowledge, it is the first distributed algorithm for optimizing the WCC metric.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.