GE-SpMM: General-Purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks

Huang, Guyue; Dai, Guohao; Wang, Yu; Yang, Huazhong

doi:10.1109/sc41405.2020.00076

Cited by 72 publications

(49 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the following, we review a comprehensive selection of software frameworks and accelerators, listed in Table 7. The analysis does not include GunRock [154] or GE-SpMM [74] for different reasons. GunRock, despite implementing GraphSAGE in its latest versions, is a graph processing library that does not exploit intra-vertex parallelism.…”

Section: Software Framework and Acceleratorsmentioning

confidence: 99%

Computing Graph Neural Networks: A Survey from Algorithms to Accelerators

et al. 2021

View full text Add to dashboard Cite

Graph Neural Networks (GNNs) have exploded onto the machine learning scene in recent years owing to their capability to model and learn from graph-structured data. Such an ability has strong implications in a wide variety of fields whose data are inherently relational, for which conventional neural networks do not perform well. Indeed, as recent reviews can attest, research in the area of GNNs has grown rapidly and has lead to the development of a variety of GNN algorithm variants as well as to the exploration of ground-breaking applications in chemistry, neurology, electronics, or communication networks, among others. At the current stage research, however, the efficient processing of GNNs is still an open challenge for several reasons. Besides of their novelty, GNNs are hard to compute due to their dependence on the input graph, their combination of dense and very sparse operations, or the need to scale to huge graphs in some applications. In this context, this article aims to make two main contributions. On the one hand, a review of the field of GNNs is presented from the perspective of computing. This includes a brief tutorial on the GNN fundamentals, an overview of the evolution of the field in the last decade, and a summary of operations carried out in the multiple phases of different GNN algorithm variants. On the other hand, an in-depth analysis of current software and hardware acceleration schemes is provided, from which a hardware-software, graph-aware, and communication-centric vision for GNN accelerators is distilled.

show abstract

Section: Software Framework and Acceleratorsmentioning

confidence: 99%

Computing Graph Neural Networks: A Survey from Algorithms to Accelerators

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Our recent work on distributed-memory GNN training Tripathy et al (2020) showed that communication-avoiding algorithms greatly accelerate GNN training at the expense of increasing memory requirements. The primary workhorse of GNN training and inference is the sparse matrix-dense matrix product (Yang et al, 2018; Huang et al, 2020). The algorithmic research on marginalized graph kernels and communication-avoiding distributed GNN training has been primarily supported by the ASCR Applied Math program.…”

Section: Algebraic Approaches For Graph Algorithms and Combinatorial Problemsmentioning

confidence: 99%

EXAGRAPH: Graph and combinatorial methods for enabling exascale applications

Acer

Azad

Boman

et al. 2021

The International Journal of High Performance Computing Applica

View full text Add to dashboard Cite

Combinatorial algorithms in general and graph algorithms in particular play a critical enabling role in numerous scientific applications. However, the irregular memory access nature of these algorithms makes them one of the hardest algorithmic kernels to implement on parallel systems. With tens of billions of hardware threads and deep memory hierarchies, the exascale computing systems in particular pose extreme challenges in scaling graph algorithms. The codesign center on combinatorial algorithms, ExaGraph, was established to design and develop methods and techniques for efficient implementation of key combinatorial (graph) algorithms chosen from a diverse set of exascale applications. Algebraic and combinatorial methods have a complementary role in the advancement of computational science and engineering, including playing an enabling role on each other. In this paper, we survey the algorithmic and software development activities performed under the auspices of ExaGraph from both a combinatorial and an algebraic perspective. In particular, we detail our recent efforts in porting the algorithms to manycore accelerator (GPU) architectures. We also provide a brief survey of the applications that have benefited from the scalable implementations of different combinatorial algorithms to enable scientific discovery at scale. We believe that several applications will benefit from the algorithmic and software tools developed by the ExaGraph team.

show abstract

“…Then, we use the loaded data (in shared memory) to calculate the corresponding tile of output feature matrix (dense). Inspired by existing works [16,22], load imbalance may severely hurt the performance on the GPU, while we solve this issue through an algorithm-hardware co-design. On the algorithm side, we limit all the filters in the same layer have the same number of un-pruned (non-zero) weights in our pattern-based pruning.…”

Section: Pattern-accelerated Spmm For Sparse Convolutionmentioning

confidence: 99%

ClickTrain

Zhang

Yuan

Niu

et al. 2021

Proceedings of the ACM International Conference on Supercomputing

View full text Add to dashboard Cite

Convolutional neural networks (CNNs) are becoming increasingly deeper, wider, and non-linear because of the growing demand on prediction accuracy and analysis quality. The wide and deep CNNs, however, require a large amount of computing resources and processing time. Many previous works have studied model pruning to improve inference performance, but little work has been done for effectively reducing training cost. In this paper, we propose Click-Train: an efficient and accurate end-to-end training and pruning framework for CNNs. Different from the existing pruning-duringtraining work, ClickTrain provides higher model accuracy and compression ratio via fine-grained architecture-preserving pruning. By leveraging pattern-based pruning with our proposed novel accurate weight importance estimation, dynamic pattern generation and selection, and compiler-assisted computation optimizations, Click-Train generates highly accurate and fast pruned CNN models for direct deployment without any extra time overhead, compared with the baseline training. ClickTrain also reduces the end-to-end time cost of the pruning-after-training method by up to 2.3× with comparable accuracy and compression ratio. Moreover, compared with the state-of-the-art pruning-during-training approach, ClickTrain provides significant improvements both accuracy and compression ratio on the tested CNN models and datasets, under similar limited training time. CCS CONCEPTS• Computing methodologies → Neural networks.

show abstract

GE-SpMM: General-Purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks

Cited by 72 publications

References 14 publications

Computing Graph Neural Networks: A Survey from Algorithms to Accelerators

Computing Graph Neural Networks: A Survey from Algorithms to Accelerators

EXAGRAPH: Graph and combinatorial methods for enabling exascale applications

ClickTrain

Contact Info

Product

Resources

About