2022
DOI: 10.48550/arxiv.2203.00158
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for Memory-Efficient Graph Convolutional Neural Networks

Abstract: Graph convolutional neural networks (GCNs) have emerged as a key technology in various application domains where the input data is relational. A unique property of GCNs is that its two primary execution stages, aggregation and combination, exhibit drastically different dataflows. Consequently, prior GCN accelerators tackle this research space by casting the aggregation and combination stages as a series of sparse-dense matrix multiplication. However, prior work frequently suffers from inefficient data movement… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 32 publications
0
2
0
Order By: Relevance
“…They also propose FlowGNN [43] which can flexibly support the majority of message-passing GNNs. [11] proposes a GCN accelerator named GROW with Gustavson's algorithm to architect a sparse-dense GEMM accelerator with row-wise product. [44] proposes MultiGCN which balance network latency and network bandwidth for large-scale GCNs in multi-node acceleration systems.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…They also propose FlowGNN [43] which can flexibly support the majority of message-passing GNNs. [11] proposes a GCN accelerator named GROW with Gustavson's algorithm to architect a sparse-dense GEMM accelerator with row-wise product. [44] proposes MultiGCN which balance network latency and network bandwidth for large-scale GCNs in multi-node acceleration systems.…”
Section: Related Workmentioning
confidence: 99%
“…Hence, accelerating GNN inference using reconfigurable accelerators such as FPGAs is essential in the LHC since it would enable sophisticated processing to run in real-time on the data stream from detectors with superior accuracy. Many existing GNN accelerators of FPGAs are often designed using a single engine architecture to process layers or sub-layers (blocks) repeatedly like GPUs, and the networks are processed in a recurrent fashion [6,7,8,9,10,11]. However, this is not efficient for GNN execution when targeting small graphs with requirements of ultra-low latency and high throughput for scientific applications, e.g., particle identification.…”
Section: Introductionmentioning
confidence: 99%