2018
DOI: 10.14778/3231751.3231764
|View full text |Cite
|
Sign up to set email alerts
|

Experimental analysis of distributed graph systems

Abstract: This paper evaluates eight parallel graph processing systems: Hadoop, HaLoop, Vertica, Giraph, GraphLab (Pow-erGraph), Blogel, Flink Gelly, and GraphX (SPARK) over four very large datasets (Twitter, World Road Network, UK 200705, and ClueWeb) using four workloads (PageRank, WCC, SSSP and K-hop). The main objective is to perform an independent scale-out study by experimentally analyzing the performance, usability, and scalability (using up to 128 machines) of these systems. In addition to performance results, w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 29 publications
(14 citation statements)
references
References 39 publications
(61 reference statements)
0
14
0
Order By: Relevance
“…The parallelization of this step is done by Apache Spark, which also de-serializes the gzipped input files. Based on existing studies of graph processing frameworks [2], we assume that for larger datasets and more complex graph summaries, e. g., using the k-chaining parameterization, multi-core performance will scale beyond 4 cores. Apache Spark is a state-of-the-art processing framework [2]; optimizing it is beyond the scope of this article.…”
Section: Discussionmentioning
confidence: 99%
“…The parallelization of this step is done by Apache Spark, which also de-serializes the gzipped input files. Based on existing studies of graph processing frameworks [2], we assume that for larger datasets and more complex graph summaries, e. g., using the k-chaining parameterization, multi-core performance will scale beyond 4 cores. Apache Spark is a state-of-the-art processing framework [2]; optimizing it is beyond the scope of this article.…”
Section: Discussionmentioning
confidence: 99%
“…The relative performance of the FPGA cluster is significantly better, consuming order-of-magnitude less energy than the Xeon cluster on the same workload. One of the characteristics of distributed graph processing systems is that a large number of machines is usually needed to provide a significant advantage over a nondistributed solution to the same problem [14]. This distribution overhead does not have such a big affect on the FPGA cluster, with hardware support for the programming model, along with efficient networking.…”
Section: Methodsmentioning
confidence: 99%
“…A recent study [14] explores the performance of three distributed graph processing systems based on Google's vertexcentric programming model [7], including the Apache Giraph system previously used at Facebook [16]. All systems were evaluated on a conventional 128-machine cluster, and a modern system called Blogel [17] was declared best performer.…”
Section: Case Study: Distributed Graph Processingmentioning
confidence: 99%
“…In addition, various weight performance evaluations have resulted in vertex replication ratios and low communication costs, which show the best PageRank performance when = 0.7, = 0.1, and =0. 2 The vertex replication ratio refers to the number of vertices replicated between nodes. It is closely related to communication cost testing because the lower the vertex replication ratio is, the lower the communication volume is.…”
Section: Performance Evaluationmentioning
confidence: 99%