Proceedings of the 49th Annual International Symposium on Computer Architecture 2022
DOI: 10.1145/3470496.3527439
|View full text |Cite
|
Sign up to set email alerts
|

Hyperscale FPGA-as-a-service architecture for large-scale distributed graph neural network

Abstract: Graph neural network (GNN) is a promising emerging application for link prediction, recommendation, etc. Existing hardware innovation is limited to single-machine GNN (SM-GNN), however, the enterprises usually adopt huge graph with large-scale distributed GNN (LSD-GNN) that has to be carried out with distributed inmemory storage. The LSD-GNN is very different from SM-GNN in terms of system architecture demand, workflow and operators, and hence characterizations.In this paper, we first quantitively characterize… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(6 citation statements)
references
References 52 publications
(48 reference statements)
0
5
0
Order By: Relevance
“…Static GNN. Over the last few years, there have been substantial research achievements for static GNN acceleration on GPUs covering general runtime frameworks [7,26,47,52], the SpMM-like aggregation optimization [9,15,16,48] and the scaling of distributed training [18,23,41,43,45,46,49].…”
Section: Gnn Accelerationmentioning
confidence: 99%
“…Static GNN. Over the last few years, there have been substantial research achievements for static GNN acceleration on GPUs covering general runtime frameworks [7,26,47,52], the SpMM-like aggregation optimization [9,15,16,48] and the scaling of distributed training [18,23,41,43,45,46,49].…”
Section: Gnn Accelerationmentioning
confidence: 99%
“…GNN models can also be trained using full-graph, this approach does not require the sampling stage; however, full-graph training causes large memory footprint [18], [19] that may not fit in a device memory (e.g., FPGA local DDR). Therefore, HitGNN focuses on accelerating minibatch GNN training as it demonstrates advantages in accuracy, scalability on large graphs, and has been adopted by many state-of-the-art GNN frameworks [6], [8], [15], [20].…”
Section: Mini-batch Gnn Trainingmentioning
confidence: 99%
“…This is because the computation characteristics of CNN and GNN are quite different: CNN models feature structured input data with high computation intensity, while GNN models There are also works that accelerate GNN training using multiple FPGAs. [18] accelerates GNN training on a distributed platform, where the graph is stored in multiple nodes. On a distributed platform, the training performance is bottlenecked by the sampling stage.…”
Section: Related Workmentioning
confidence: 99%
“…A lightweight opensource RISC-V core [54] is used for programmability and control. The access engine [55] is customized for low-latency sampling and support out-of-order request for latency hiding. For computation, FIGURE 5.…”
Section: Evaluation a Evaluation Setup 1) System Configurationmentioning
confidence: 99%
“…1, the sampling phase gathers the graph structure and features from local and remote machines for the subsequent aggregation and combination phase. During graph sampling, nearly 48% of the memory accesses [55] are for graph structure (e.g. node ID, edge offset of the CSR-formatted adjacent matrix), and this kind of memory access is small in size (8 -64Bytes) and discontinuous.…”
Section: Introductionmentioning
confidence: 99%