Gist: Efficient Data Encoding for Deep Neural Network Training

Jain, Animesh; Phanishayee, Amar; Mars, Jason; Tang, Lingjia; Pekhimenko, Gennady

doi:10.1109/isca.2018.00070

Cited by 114 publications

(51 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The actual computation in the sparse GEMM (described below) still uses single precision (FP32) by converting FP16 data back to FP32. Therefore, the precision lost during conversion has little impact on the overall training performance [10], [33]. We use the second 16 bits to represent the row index.…”

Section: Ellpack-dib Based Gemmmentioning

confidence: 99%

“…The computer architecture community has explored methods to improve execution efficiency and memory usage when processing DNNs. Of the many approaches pursued, leveraging weight/activation sparsity has attracted a lot of attention [8], [9], [10], [11], [12], [13]. Prior studies have explored exploiting sparsity to accelerate DNN computations during both training and inference on customized platforms (e.g., FPGAs and ASICs) [4], [8], [9], [12].…”

Section: Introductionmentioning

confidence: 99%

“…For inference, the major source of sparsity occurs after applying weight sparsification and quantization [14]. In contrast to inference, for training, sparsity is produced by the ReLU activation function during both forward and backward propagation, and in the Max Pooling layer during backward propagation [10]. A recent study found that there can be as much as 90% activation sparsity in AlexNet [15].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Spartan: A Sparsity-Adaptive Framework to Accelerate Deep Neural Network Training on GPUs

Dong

Agostini

Karimi

et al. 2021

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Deep Neural Networks (DNNs) have emerged as an important class of machine learning algorithms, providing accurate solutions to a broad range of applications. Sparsity in activation maps in DNN training presents an opportunity to reduce computations. However, exploiting activation sparsity presents two major challenges: i) profiling activation sparsity during training comes with significant overhead due to computing the degree of sparsity and the data movement; ii) the dynamic nature of activation maps requires dynamic dense-to-sparse conversion during training, leading to significant overhead. In this paper, we present Spartan, a lightweight hardware/software framework to accelerate DNN training on a GPU. Spartan provides a cost-effective and programmer-transparent microarchitectural solution to exploit activation sparsity detected during training. Spartan provides an efficient sparsity monitor, a tile-based sparse GEMM algorithm, and a novel compaction engine designed for GPU workloads. Spartan can reduce sparsity profiling overhead by 52.5× on average. For the most compute-intensive layers, i.e., convolutional layers, we can speedup AlexNet by 3.4×, VGGNet-16 by 2.14×, and ResNet-18 by 2.02×, when training on the ImageNet dataset.

show abstract

Section: Ellpack-dib Based Gemmmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Spartan: A Sparsity-Adaptive Framework to Accelerate Deep Neural Network Training on GPUs

Dong

Agostini

Karimi

et al. 2021

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

show abstract

“…Therefore, it is essential to reduce the memory requirements to allow better network training and deployment, such as applying deep CNNs to embedded systems and cell phones. Several studies [4] show that the intermediate layer outputs (feature maps) are the primary contributors to this memory bottleneck. Existing methods such as model compression [5,6] and scheduling [7], do not directly address the storage of feature maps.…”

Section: Introductionmentioning

confidence: 99%

Spectral Domain Convolutional Neural Network

Guan

Zhang

Sethares

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

The memory consumption of most Convolutional Neural Network (CNN) architectures grows rapidly with increasing depth of the network, which is a major constraint for efficient network training on modern GPUs with limited memory, embedded systems, and mobile devices. Several studies show that the feature maps (as generated after the convolutional layers) are the main bottleneck in this memory problem. Often, these feature maps mimic natural photographs in the sense that their energy is concentrated in the spectral domain. Although embedding CNN architectures in the spectral domain is widely exploited to accelerate the training process, we demonstrate that it is also possible to use the spectral domain to reduce the memory footprint, a method we call Spectral Domain Convolutional Neural Network (SpecNet) that performs both the convolution and the activation operations in the spectral domain. The performance of SpecNet is evaluated on three competitive object recognition benchmark tasks (CIFAR-10, SVHN, and ImageNet), and compared with several state-of-the-art implementations. Overall, SpecNet is able to reduce memory consumption by about 60% without significant loss of performance for all tested networks.

show abstract

“…The neural network is never satiated with the current speed [Lu and Liang, 2018;Yu et al, 2017;Zhao et al, 2017]. At the moment of the advent of the neural network, the research on the accelerating these networks also began [Posewsky and Ziener, 2018;Jain et al, 2018;Zhang et al, 2015]. The representative acceleration methods are mainly built on the FFT[Mathieu et al, 2013], which fully utilizes the component reuse in the frequency domain.…”

Section: Introductionmentioning

confidence: 99%

Accelerated Inference Framework of Sparse Neural Network Based on Nested Bitmask Structure

Zhang

et al. 2019

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

In order to satisfy the ever-growing demand for high-performance processors for neural networks, the state-of-the-art processing units tend to use application-oriented circuits to replace Processing Engine (PE) on the GPU under circumstances where low-power solutions are required. The application-oriented PE is fully optimized in terms of the circuit architecture and eliminates incorrect data dependency and instructional redundancy. In this paper, we propose a novel encoding approach on a sparse neural network after pruning. We partition the weight matrix into numerous blocks and use a low-rank binary map to represent the validation of these blocks. Furthermore, the elements in each nonzero block are also encoded into two submatrices: one is the binary stream discriminating the zero/nonzero position, while the other is the pure nonzero elements stored in the FIFO. In the experimental part, we implement a well pre-trained sparse neural network on the Xilinx FPGA VC707. Experimental results show that our algorithm outperforms the other benchmarks. Our approach has successfully optimized the throughput and the energy efficiency to deal with a single frame. Accordingly, we contend that Nested Bitmask Neural Network (NBNN), is an efficient neural network structure with only minor accuracy loss on the SoC system.

show abstract

Gist: Efficient Data Encoding for Deep Neural Network Training

Cited by 114 publications

References 13 publications

Spartan: A Sparsity-Adaptive Framework to Accelerate Deep Neural Network Training on GPUs

Spartan: A Sparsity-Adaptive Framework to Accelerate Deep Neural Network Training on GPUs

Spectral Domain Convolutional Neural Network

Accelerated Inference Framework of Sparse Neural Network Based on Nested Bitmask Structure

Contact Info

Product

Resources

About