GraphGen: An FPGA Framework for Vertex-Centric Graph Computation

Nurvitadhi, Eriko; Weisz, Gabriel; Wang, Yu; Hurkat, Skand; Nguyen, Marie; Hoe, James C.; Martínez, José F.; Guestrin, Carlos

doi:10.1109/fccm.2014.15

Cited by 93 publications

(52 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Graph processing plays an important role in many realworld applications, e.g., ranking the web sites [1], analysing the social networks [2], and streaming applications [3]. Therefore, a large number of research efforts have been made to build the dedicated hardware that can execute graph applications with more efficiency than what the generalpurpose processors and systems can provide [4]- [7].…”

Section: Introductionmentioning

confidence: 99%

An efficient graph accelerator with parallel data conflict management

Yao

Zheng

Jin

et al. 2018

Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques

View full text Add to dashboard Cite

Graph-specific computing with the support of dedicated accelerator has greatly boosted the graph processing in both efficiency and energy. Nevertheless, their data conflict management is still sequential in essential when some vertex needs a large number of conflicting updates at the same time, leading to prohibitive performance degradation. This is particularly true for processing natural graphs.In this paper, we have the insight that the atomic operations for the vertex updating of many graph algorithms (e.g., BFS, PageRank and WCC) are typically incremental and simplex. This hence allows us to parallelize the conflicting vertex updates in an accumulative manner. We architect a novel graphspecific accelerator that can simultaneously process atomic vertex updates for massive parallelism on the conflicting data access while ensuring the correctness. A parallel accumulator is designed to remove the serialization in atomic protection for conflicting vertex updates through merging their results in parallel. Our implementation on Xilinx Virtex UltraScale+ XCVU9P with a wide variety of typical graph algorithms shows that our accelerator achieves an average throughput by 2.36 GTEPS as well as up to 3.14x performance speedup in comparison with state-of-the-art ForeGraph (with single-chip version).Input: Graph G = (V , E), root vertex r Output: Distance of each ∈ V , dis[ ] 1 Q ← r ; 2 dis[r ] = 0; 3 while Q is not empty do 4

show abstract

Section: Introductionmentioning

confidence: 99%

An efficient graph accelerator with parallel data conflict management

Yao

Zheng

Jin

et al. 2018

Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques

View full text Add to dashboard Cite

show abstract

“…This work provides a system that uses 3D integration technology, and tries to maximize the available memory bandwidth. On the other hand, GraphGen [42] is a framework to create applicationspecific synthesized graph processors and memory layout for FPGAs. GraphGen also uses a vertex centric execution model to represent graph applications.…”

Section: B Custom and Reconfigurable Logic Acceleratorsmentioning

confidence: 99%

Hardware accelerator design for data centers

Yesil¹,

Özdal²,

Ayupov³

et al. 2015

2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

View full text Add to dashboard Cite

As the size of available data is increasing, it is becoming inefficient to scale the computational power of traditional systems. To overcome this problem, customized applicationspecific accelerators are becoming integral parts of modern system on chip (SOC) architectures. In this paper, we summarize existing hardware accelerators for data centers and discuss the techniques to implement and embed them along with the existing SOCs.

show abstract

“…Calculate leverage, p-value and z-score according (1); Algorithm 1: The complete Link Assessment algorithm, calculating the similarity measures presented in 2014 [11] or the Graphlet Counting Case Study from Betkaoui et al in 2011 [12] that generate specific data processing engines for particular graph operations. Both approaches aim at optimizing the memory accessing schemes for the dynamic random-access memory (DRAM) in order to fully exploit the available memory bandwidths.…”

Section: Datamentioning

confidence: 99%

A Custom Computing System for Finding Similarties in Complex Networks

Brugger

Grigorovici

Jung³

et al. 2015

2015 IEEE Computer Society Annual Symposium on VLSI

View full text Add to dashboard Cite

Complex graphs are at the heart of today's big data challenges like recommendation systems, customer behavior modeling, or incident detection systems. One reoccurring task in these fields is the extraction of network motifs, reoccurring and statistically significant subgraphs. In this work we propose a precisely tailored embedded architecture for computing similarities based on one special network motif, the co-occurrence. It is based on efficient and scalable building blocks that exploit well-tuned algorithmic refinements and an optimized graph data representation approach. On chip, our solution features a customized cache design and a lightweight data path that allows the system to perform over 10,000 graph operations per cycle on each chip. We provide detailed area, energy, and timing results for a 28 nm ASIC process and DDR3 memory devices. Compared to an Intel cluster, our proposed solution uses 44x less memory and is 224x more energy efficient.

show abstract

GraphGen: An FPGA Framework for Vertex-Centric Graph Computation

Cited by 93 publications

References 9 publications

An efficient graph accelerator with parallel data conflict management

An efficient graph accelerator with parallel data conflict management

Hardware accelerator design for data centers

A Custom Computing System for Finding Similarties in Complex Networks

Contact Info

Product

Resources

About