2019 IEEE High Performance Extreme Computing Conference (HPEC) 2019
DOI: 10.1109/hpec.2019.8916498
|View full text |Cite
|
Sign up to set email alerts
|

Accelerating DNN Inference with GraphBLAS and the GPU

Abstract: This work addresses the 2019 Sparse Deep Neural Network Graph Challenge with an implementation of this challenge using the GraphBLAS programming model. We demonstrate our solution to this challenge with GraphBLAST, a GraphBLAS implementation on the GPU, and compare it to SuiteSparse, a GraphBLAS implementation on the CPU. The GraphBLAST implementation is 1.94× faster than Suite-Sparse; the primary opportunity to increase performance on the GPU is a higher-performance sparse-matrix-times-sparse-matrix (SpGEMM) … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
3
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(3 citation statements)
references
References 11 publications
0
3
0
Order By: Relevance
“…Motivated by the computational advantages and reduced sizes to handle very large data and models, efficient inference computation on sparse DNNs has attracted significant attention [36]. Parallel algorithms for sparse computations on shared-memory systems are recently proposed (e.g., GPUs [4,27,50,58], multiprocessors [15,48,51]). Since these approaches implement only inference computation and are not used for training, each input data vector can be independently processed and distributed parallelism can be achieved by just splitting the input dataset and replicating DNN models among multiple compute nodes.…”
Section: Related Workmentioning
confidence: 99%
“…Motivated by the computational advantages and reduced sizes to handle very large data and models, efficient inference computation on sparse DNNs has attracted significant attention [36]. Parallel algorithms for sparse computations on shared-memory systems are recently proposed (e.g., GPUs [4,27,50,58], multiprocessors [15,48,51]). Since these approaches implement only inference computation and are not used for training, each input data vector can be independently processed and distributed parallelism can be achieved by just splitting the input dataset and replicating DNN models among multiple compute nodes.…”
Section: Related Workmentioning
confidence: 99%
“…These mathematics have been implemented in a variety of software libraries, including the GraphBLAS standard [13]- [16] implemented in the C/Matlab/Octave/Python/Julia languages [17]- [20] and the RedisGraph database [21]; the C-MPI CombBLAS parallel library [22]; and the D4M associative array library in Matlab/Octave/Python/Julia languages [23]- [27] with database bindings to SciDB, Accumulo, and PostGreSQL [28]- [32]. The GraphBLAS standard has further enabled hardware acceleration of these mathematics via multithreading [33], GPUs [34], and special purpose accelerators [35]- [39].…”
Section: Introductionmentioning
confidence: 99%
“…Graph Challenge 2019 Student Innovation Award, Finalist, and Honorable Mention. Sparse DNN execution time vs number of operations and corresponding model fits for Wang-UCDavis-2019[64], Wang-PingAn-2019[65], and Mofrad-UPitt-2019[66].…”
mentioning
confidence: 99%