Accelerating DNN Inference with GraphBLAS and the GPU

Wang, Xiaoyun; Lin, Zhongyi; Yang, Carl; Owens, John D.

doi:10.1109/hpec.2019.8916498

Cited by 9 publications

(3 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Motivated by the computational advantages and reduced sizes to handle very large data and models, efficient inference computation on sparse DNNs has attracted significant attention [36]. Parallel algorithms for sparse computations on shared-memory systems are recently proposed (e.g., GPUs [4,27,50,58], multiprocessors [15,48,51]). Since these approaches implement only inference computation and are not used for training, each input data vector can be independently processed and distributed parallelism can be achieved by just splitting the input dataset and replicating DNN models among multiple compute nodes.…”

Section: Related Workmentioning

confidence: 99%

Partitioning sparse deep neural networks for scalable training and inference

Demirci

Ferhatosmanoğlu

2021

Proceedings of the ACM International Conference on Supercomputing

View full text Add to dashboard Cite

The state-of-the-art deep neural networks (DNNs) have significant computational and data management requirements. The size of both training data and models continue to increase. Sparsification and pruning methods are shown to be effective in removing a large fraction of connections in DNNs. The resulting sparse networks present unique challenges to further improve the computational efficiency of training and inference in deep learning. Both the feedforward (inference) and backpropagation steps in stochastic gradient descent (SGD) algorithm for training sparse DNNs involve consecutive sparse matrix-vector multiplications (SpMVs). We first introduce a distributed-memory parallel SpMV-based solution for the SGD algorithm to improve its scalability. The parallelization approach is based on row-wise partitioning of weight matrices that represent neuron connections between consecutive layers. We then propose a novel hypergraph model for partitioning weight matrices to reduce the total communication volume and ensure computational load-balance among processors. Experiments performed on sparse DNNs demonstrate that the proposed solution is highly efficient and scalable. By utilizing the proposed matrix partitioning scheme, the performance of our solution is further improved significantly.

show abstract

Section: Related Workmentioning

confidence: 99%

Partitioning sparse deep neural networks for scalable training and inference

Demirci

Ferhatosmanoğlu

2021

Proceedings of the ACM International Conference on Supercomputing

View full text Add to dashboard Cite

show abstract

“…These mathematics have been implemented in a variety of software libraries, including the GraphBLAS standard [13]- [16] implemented in the C/Matlab/Octave/Python/Julia languages [17]- [20] and the RedisGraph database [21]; the C-MPI CombBLAS parallel library [22]; and the D4M associative array library in Matlab/Octave/Python/Julia languages [23]- [27] with database bindings to SciDB, Accumulo, and PostGreSQL [28]- [32]. The GraphBLAS standard has further enabled hardware acceleration of these mathematics via multithreading [33], GPUs [34], and special purpose accelerators [35]- [39].…”

Section: Introductionmentioning

confidence: 99%

Mathematics of Digital Hyperspace

Kepner

Davis

Gadepally

et al. 2021

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

View full text Add to dashboard Cite

Social media, e-commerce, streaming video, e-mail, cloud documents, web pages, traffic flows, and network packets fill vast digital lakes, rivers, and oceans that we each navigate daily. This digital hyperspace is an amorphous flow of data supported by continuous streams that stretch standard concepts of type and dimension. The unstructured data of digital hyperspace can be elegantly represented, traversed, and transformed via the mathematics of hypergraphs, hypersparse matrices, and associative array algebra. This paper explores a novel mathematical concept, the semilink, that combines pairs of semirings to provide the essential operations for graph analytics, database operations, and machine learning. The GraphBLAS standard currently supports hypergraphs, hypersparse matrices, the mathematics required for semilinks, and seamlessly performs graph, network, and matrix operations. With the addition of key based indices (such as pointers to strings) and semilinks, GraphBLAS can become a richer associative array algebra and be a plug-in replacement for spreadsheets, database tables, and data centric operating systems, enhancing the navigation of unstructured data found in digital hyperspace.

show abstract

“…Graph Challenge 2019 Student Innovation Award, Finalist, and Honorable Mention. Sparse DNN execution time vs number of operations and corresponding model fits for Wang-UCDavis-2019[64], Wang-PingAn-2019[65], and Mofrad-UPitt-2019[66].…”

mentioning

confidence: 99%

GraphChallenge.org Sparse Deep Neural Network Performance

Kepner,

Alford,

Gadepally

et al. 2020

Preprint

View full text Add to dashboard Cite

The MIT/IEEE/Amazon GraphChallenge.org encourages community approaches to developing new solutions for analyzing graphs and sparse data. Sparse AI analytics present unique scalability difficulties. The Sparse Deep Neural Network (DNN) Challenge draws upon prior challenges from machine learning, high performance computing, and visual analytics to create a challenge that is reflective of emerging sparse AI systems. The sparse DNN challenge is based on a mathematically welldefined DNN inference computation and can be implemented in any programming environment. In 2019 several sparse DNN challenge submissions were received from a wide range of authors and organizations. This paper presents a performance analysis of the best performers of these submissions. These submissions show that their state-of-the-art sparse DNN execution time, TDNN, is a strong function of the number of DNN operations performed, Nop. The sparse DNN challenge provides a clear picture of current sparse DNN systems and underscores the need for new innovations to achieve high performance on very large sparse DNNs.

show abstract

Accelerating DNN Inference with GraphBLAS and the GPU

Cited by 9 publications

References 11 publications

Partitioning sparse deep neural networks for scalable training and inference

Partitioning sparse deep neural networks for scalable training and inference

Mathematics of Digital Hyperspace

GraphChallenge.org Sparse Deep Neural Network Performance

Contact Info

Product

Resources

About