Distributed edge partitioning for trillion-edge graphs

Hanai, Masahiro; Suzumura, Toyotaro; Tan, Wen Jun; Liu, Elvis; Théodoropoulos, Georgios; Cai, Wentong

doi:10.14778/3358701.3358706

Cited by 37 publications

(29 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…NE selects edge gradually to fully fill each partition and this approach performs quite well. M. Hanai et al [14] proposed a follow-up study. They reformed NE to distribute its approach, this proposition can process trillion-edge graphs and achieves better performance in terms of running time.…”

Section: B Edge Partitioningmentioning

confidence: 99%

WSGP: A Window-based Streaming Graph Partitioning Approach

Orgerie

et al. 2021

2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid)

View full text Add to dashboard Cite

Graph partitioning, a preliminary step of distributed graph processing, has been attracting increasing attention in the last decade. A high quality graph partitioning algorithm should facilitate graph processing by minimizing the communication overhead and maintaining the load balancing among distributed computing units. Offline partitioning algorithms usually require the knowledge of a complete graph,and therefore, are not adaptive to handle massive graph-structured data. On the contrary, streaming partitioning algorithms take edges or vertices as a stream and make partitioning decisions on the fly. However, the streaming manner faces dilemmas from time to time because of a lack of knowledge. Furthermore, an unmindful partitioning decision in such a dilemma could significantly decrease the partition quality. In this paper, we propose a novel window-based streaming graph partitioning algorithm (WSGP). WSGP leverages a greedy-based heuristic to perform edge partitioning. When facing a decision dilemma, WSGP utilizes a size-bounded window to buffer the edges. When the window is fully filled, an edge is poped and assigned to a partition. The assignment is decided by knowledge obtained from both the edges already settled and the ones still cached in the buffer window. Our experiments take into account various real-world benchmark graphs. The experimental results demonstrate that WSGP consistently has a smaller replication factor than the state-of-the-art algorithms by up to 23%, at a limited cost in terms of memory and comprehensive running time.

show abstract

Section: B Edge Partitioningmentioning

confidence: 99%

WSGP: A Window-based Streaming Graph Partitioning Approach

Orgerie

et al. 2021

2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid)

View full text Add to dashboard Cite

show abstract

“…The communication cost stems from the vertices/edges spanning computing nodes to ensure the synchronization among all computing nodes. Unfortunately, the graph partition problem with these two constraints is proved to be an NP-hard problem [10], so it is often solved by heuristic methods.…”

Section: Related Workmentioning

confidence: 99%

“…SWR [23] resorts the edges in the sliding window to move the edges with low-degree vertices upfront so these edges are less likely to span different computing nodes. Distributed NE [10] selects initial multiple random vertices and then greedily expands each edge set in parallel such that the increase of the vertex cuts becomes minimal, which can allocate most edges in a locally optimal way and seldom uses the random allocation. The locality of real-world graphs also implies many adjacent lists share a lot of common out-neighbors, which is named by target vertices in TSH [24].…”

Section: Related Workmentioning

confidence: 99%

GAP: Genetic Algorithm Based Large-Scale Graph Partition in Heterogeneous Cluster

Cui

Zhou

et al. 2020

IEEE Access

View full text Add to dashboard Cite

Graph is an important model to describe various networks, and its scale becomes larger and larger with the development of communication and information technology. The analysis of large-scale graphs requires distributed graph processing systems, and graph partition is the basis of these systems. The existing graph partitioning algorithms are almost proposed for homogeneous clusters, which don't consider the differences among computing nodes in heterogeneous clusters. This paper proposes GAP, a Genetic Algorithm based graph Partitioning algorithm to solve this problem. GAP aims to reduce the total processing time on a heterogeneous cluster by partitioning graphs according to the computing powers of computing nodes. GAP balanced partition the graph initially, and then utilizes genetic algorithm to transfer vertices to reduce cut edges. GAP can balance the processing time of computing nodes, and reduce the communication time among computing nodes. The experiments performed on a heterogeneous cluster demonstrate the outperformance of GAP than Hash.

show abstract

“…Existing partitioning algorithms can be divided into two categories: In-memory algorithms [30,44,55,66] and streaming algorithms [28,32,47,51,64]. In-memory algorithms load the complete graph into memory, and, hence, have full flexibility to assign any edge to any partition at any time.…”

Section: Introductionmentioning

confidence: 99%

“…Streaming algorithms consume little memory, but even though they have been improved by sophisticated techniques such as window-based streaming [47] and multi-pass streaming [48], they do not yield the same partitioning quality on all graphs as the best in-memory algorithms. In current graph partitioning systems, the user has to decide for one of the two options, and then either provide a very large machine (or a cluster of machines) and get good partitioning quality [30,44,55,66] or a small machine and get worse partitioning quality [28,32,47,51,64].…”

Section: Introductionmentioning

confidence: 99%

Hybrid Edge Partitioner: Partitioning Large Power-Law Graphs under Memory Constraints

Mayer

Jacobsen

2021

Proceedings of the 2021 International Conference on Management of Data

View full text Add to dashboard Cite

Distributed systems that manage and process graph-structured data internally solve a graph partitioning problem to minimize their communication overhead and query run-time. Besides computational complexity-optimal graph partitioning is NP-hard-another important consideration is the memory overhead. Real-world graphs often have an immense size, such that loading the complete graph into memory for partitioning is not economical or feasible. Currently, the common approach to reduce memory overhead is to rely on streaming partitioning algorithms. While the latest streaming algorithms lead to reasonable partitioning quality on some graphs, they are still not completely competitive to in-memory partitioners. In this paper, we propose a new system, Hybrid Edge Partitioner (HEP), that can partition graphs that fit partly into memory while yielding a high partitioning quality. HEP can flexibly adapt its memory overhead by separating the edge set of the graph into two sub-sets. One sub-set is partitioned by NE++, a novel, efficient in-memory algorithm, while the other sub-set is partitioned by a streaming approach. Our evaluations on large real-world graphs show that in many cases, HEP outperforms both in-memory partitioning and streaming partitioning at the same time. Hence, HEP is an attractive alternative to existing solutions that cannot finetune their memory overheads. Finally, we show that using HEP, we achieve a significant speedup of distributed graph processing jobs on Spark/GraphX compared to state-of-the-art partitioning algorithms. CCS CONCEPTS• Information systems → Graph-based database models; • Theory of computation → Graph algorithms analysis.

show abstract

Distributed edge partitioning for trillion-edge graphs

Cited by 37 publications

References 42 publications

WSGP: A Window-based Streaming Graph Partitioning Approach

WSGP: A Window-based Streaming Graph Partitioning Approach

GAP: Genetic Algorithm Based Large-Scale Graph Partition in Heterogeneous Cluster

Hybrid Edge Partitioner: Partitioning Large Power-Law Graphs under Memory Constraints

Contact Info

Product

Resources

About