Energy Efficient Architecture for Graph Analytics Accelerators

Özdal, Mustafa; Yesil, Serif; Kim, Taemin; Ayupov, Andrey; Greth, John; Burns, Steven; Öztürk, Özcan

doi:10.1109/isca.2016.24

Cited by 103 publications

(39 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The area, power, and access latency of the on-chip scratchpad memory are estimated using Cacti 6.5 [1]. Since Cacti only supports down to 32 nm technologies, we apply four different scaling factors to convert them to 12 nm technology as shown in [33,36]. The energy of HBM 1.0 is estimated with 7 pJ/bit as in [32,41].…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

HyGCN: A GCN Accelerator with Hybrid Architecture

Yan

Deng

et al. 2020

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)

240

205

View full text Add to dashboard Cite

Inspired by the great success of neural networks, graph convolutional neural networks (GCNs) are proposed to analyze graph data. GCNs mainly include two phases with distinct execution patterns. The Aggregation phase, behaves as graph processing, showing a dynamic and irregular execution pattern. The Combination phase, acts more like the neural networks, presenting a static and regular execution pattern. The hybrid execution patterns of GCNs require a design that alleviates irregularity and exploits regularity. Moreover, to achieve higher performance and energy efficiency, the design needs to leverage the high intra-vertex parallelism in Aggregation phase, the highly reusable inter-vertex data in Combination phase, and the opportunity to fuse phase-by-phase execution introduced by the new features of GCNs. However, existing architectures fail to address these demands.In this work, we first characterize the hybrid execution patterns of GCNs on Intel Xeon CPU. Guided by the characterization, we design a GCN accelerator, HyGCN, using a hybrid architecture to efficiently perform GCNs. Specifically, first, we build a new programming model to exploit the fine-grained parallelism for our hardware design. Second, we propose a hardware design with two efficient processing engines to alleviate the irregularity of Aggregation phase and leverage the regularity of Combination phase. Besides, these engines can exploit various parallelism and reuse highly reusable data efficiently. Third, we optimize the overall system via inter-engine pipeline for inter-phase fusion and priority-based off-chip memory access coordination to improve off-chip bandwidth utilization. Compared to the state-of-the-art software framework running on Intel Xeon CPU and NVIDIA V100 GPU, our work achieves on average 1509× speedup with 2500× energy reduction and average 6.5× speedup with 10× energy reduction, respectively. * Corresponding author is Xiaochun Ye and his email is yexi-aochun@ict.ac.cn.

show abstract

Section: Methodsmentioning

confidence: 99%

“…GCNs demand specialized architecture design. With the emergence of graph analytics and neural networks workloads, a lot of hardware architecture designs are proposed to accelerate these workloads [7,8,17,22,33]. For example, Graphicionado [17] is tailored for graph analtyics; while TPU [22] focuses on the acceleration of neural networks.…”

Section: Related Workmentioning

confidence: 99%

HyGCN: A GCN Accelerator with Hybrid Architecture

Yan

Deng

et al. 2020

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)

240

205

View full text Add to dashboard Cite

show abstract

“…If incoming edges are used in the edge array, this layout is called Compressed Sparse Column (CSC). The compressed adjacency list graph is relatively compact and beneficial to many graph accelerators [29,71] . Note that the edges of each vertex are stored sequentially.…”

Section: Graph Layout Reorganizationmentioning

confidence: 99%

“…Specifically in terms of graph processing, it has been also witnessed that a large number of relevant studies build their graph processing accelerators based on FPGA [24][25][26][27][28] and ASIC [16,[29][30][31] . Evaluation on these http://www.riscv.org, Jan. 2019.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Survey on Graph Processing Accelerators: Challenges and Opportunities

Gui

Zheng

et al. 2019

J. Comput. Sci. Technol.

View full text Add to dashboard Cite

Graph is a well known data structure to represent the associated relationships in a variety of applications, e.g., data science and machine learning. Despite a wealth of existing efforts on developing graph processing systems for improving the performance and/or energy efficiency on traditional architectures, dedicated hardware solutions, also referred to as graph processing accelerators, are essential and emerging to provide the benefits significantly beyond those pure software solutions can offer. In this paper, we conduct a systematical survey regarding the design and implementation of graph processing accelerator. Specifically, we review the relevant techniques in three core components toward a graph processing accelerator: preprocessing, parallel graph computation and runtime scheduling. We also examine the benchmarks and results in existing studies for evaluating a graph processing accelerator. Interestingly, we find that there is not an absolute winner for all three aspects in graph acceleration due to the diverse characteristics of graph processing and complexity of hardware configurations. We finially present to discuss several challenges in details, and to further explore the opportunities for the future research.

show abstract