Experimental analysis of distributed graph systems

Ammar, Khaled; Özsu, M. Tamer

doi:10.14778/3231751.3231764

Cited by 29 publications

(14 citation statements)

References 39 publications

(61 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The parallelization of this step is done by Apache Spark, which also de-serializes the gzipped input files. Based on existing studies of graph processing frameworks [2], we assume that for larger datasets and more complex graph summaries, e. g., using the k-chaining parameterization, multi-core performance will scale beyond 4 cores. Apache Spark is a state-of-the-art processing framework [2]; optimizing it is beyond the scope of this article.…”

Section: Discussionmentioning

confidence: 99%

Incremental and Parallel Computation of Structural Graph Summaries for Evolving Graphs

Blume

Richerby

Scherp

2020

Proceedings of the 29th ACM International Conference on Information &Amp; Knowledge Management

View full text Add to dashboard Cite

Graph summarization is the task of finding condensed representations of graphs such that a chosen set of (structural) subgraph features in the graph summary are equivalent to the input graph. Existing graph summarization algorithms are tailored to specific graph summary models, only support one-time batch computation, are designed and implemented for a specific task, or evaluated using static graphs. Our novel, incremental, parallel algorithm addresses all these shortcomings. We support various structural graph summary models defined in our formal language FLUID. All graph summaries defined with FLUID can be updated in time O(∆ •d k), where ∆ is the number of additions, deletions, and modifications to the input graph, d is its maximum degree, and k is the maximum distance in the subgraphs considered. We empirically evaluate the performance of our algorithm on benchmark and real-world datasets. Our experiments show that, for commonly used summary models and datasets, the incremental summarization algorithm almost always outperforms their batch counterpart, even when about 50% of the graph database changes. The source code and the experimental results are openly available for reproducibility and extensibility.

show abstract

Section: Discussionmentioning

confidence: 99%

Incremental and Parallel Computation of Structural Graph Summaries for Evolving Graphs

Blume

Richerby

Scherp

2020

Proceedings of the 29th ACM International Conference on Information &Amp; Knowledge Management

View full text Add to dashboard Cite

show abstract

“…The relative performance of the FPGA cluster is significantly better, consuming order-of-magnitude less energy than the Xeon cluster on the same workload. One of the characteristics of distributed graph processing systems is that a large number of machines is usually needed to provide a significant advantage over a nondistributed solution to the same problem [14]. This distribution overhead does not have such a big affect on the FPGA cluster, with hardware support for the programming model, along with efficient networking.…”

Section: Methodsmentioning

confidence: 99%

“…A recent study [14] explores the performance of three distributed graph processing systems based on Google's vertexcentric programming model [7], including the Apache Giraph system previously used at Facebook [16]. All systems were evaluated on a conventional 128-machine cluster, and a modern system called Blogel [17] was declared best performer.…”

Section: Case Study: Distributed Graph Processingmentioning

confidence: 99%

Tinsel: A Manythread Overlay for FPGA Clusters

Naylor

Moore

Thomas

2019

2019 29th International Conference on Field Programmable Logic and Applications (FPL)

View full text Add to dashboard Cite

Commodity FPGA boards with advanced networking facilities have great potential in the construction of highperformance compute clusters that scale. However, low-level design tools and long synthesis times are major barriers to productivity for application developers. In this paper, we explore the potential of a distributed soft-processor overlay, programmed in software at a high-level of abstraction, to deliver a useful level of performance for FPGA clusters. In particular, we demonstrate the use of hardware multhreading to achieve a fast, spaceefficient, high-throughput overlay, and compare a 12-FPGA instance of it (12,288 RISC-V threads) against a conventional Xeon cluster on the problem of distributed graph processing.

show abstract

“…In addition, various weight performance evaluations have resulted in vertex replication ratios and low communication costs, which show the best PageRank performance when = 0.7, = 0.1, and =0. 2 The vertex replication ratio refers to the number of vertices replicated between nodes. It is closely related to communication cost testing because the lower the vertex replication ratio is, the lower the communication volume is.…”

Section: Performance Evaluationmentioning

confidence: 99%

Dynamic Graph Partitioning Scheme for Supporting Load Balancing in Distributed Graph Environments

et al. 2021

View full text Add to dashboard Cite

As dynamic graph data have been actively used, incremental graph partition schemes have been studied to efficiently store and manage large graphs. In this paper, we propose a vertex-cut based novel incremental graph partitioning scheme that supports load balancing in a distributed environment. The proposed scheme chooses the load of each node that considers its storage utilization and throughput as the partitioning criterion. The proposed scheme defines hot data that means a particular vertex frequently searched among graphs requested by queries. We manage and utilize hot data for graph partitioning. Finally, we perform vertex-cut based dynamic graph partitioning by using a vertex replication index, the load each node, and hot data to distribute the load evenly in a distributed environment. In order to verify the superiority of the proposed partitioning scheme, we compare it with the existing partitioning schemes through a variety of performance evaluations.

show abstract

Experimental analysis of distributed graph systems

Cited by 29 publications

References 39 publications

Incremental and Parallel Computation of Structural Graph Summaries for Evolving Graphs

Incremental and Parallel Computation of Structural Graph Summaries for Evolving Graphs

Tinsel: A Manythread Overlay for FPGA Clusters

Dynamic Graph Partitioning Scheme for Supporting Load Balancing in Distributed Graph Environments

Contact Info

Product

Resources

About