TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training

Mahmoud, Magdi S.; Edo, Isak; Zadeh, Ali Hadi; Awad, Omar Mohamed; Pekhimenko, Gennady; Albericio, Jorge; Moshovos, Andreas

doi:10.1109/micro50266.2020.00069

Cited by 52 publications

(26 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To demonstrate the effectiveness of Shift-BNN, we compare it with three training accelerators: Firstly, since Shift-BNN adopts RC-mapping as the fundamental design strategy, we compare it with the RC-accelerator that adopts RC-mapping strategy but without LFSR reversion technique. Secondly, since MN-mapping is commonly used in existing DNN training accelerators [39,64], we employ an MN-accelerator that adopts MNmapping strategy without LFSR reversion technique as the baseline accelerator for generality, which is also used for our preliminary investigation in Sec.3. Thirdly, to verify the analysis about design alternatives (see Sec.5), we further test the effectiveness of our LFSR reversion strategy on MN-accelerator by comparing with an MN-Shift-accelerator that adopts both MN-mapping strategy and LFSR reversion technique.…”

Section: Evaluation 71 Experimental Methodologymentioning

confidence: 99%

See 1 more Smart Citation

Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving

Wan

Xia

Zhang

et al. 2021

MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

View full text Add to dashboard Cite

Bayesian Neural Networks (BNNs) that possess a property of uncertainty estimation have been increasingly adopted in a wide range of safety-critical AI applications which demand reliable and robust decision making, e.g., self-driving, rescue robots, medical image diagnosis. The training procedure of a probabilistic BNN model involves training an ensemble of sampled DNN models, which induces orders of magnitude larger volume of data movement than training a single DNN model. In this paper, we reveal that the root cause for BNN training inefficiency originates from the massive off-chip data transfer by Gaussian Random Variables (GRVs). To tackle this challenge, we propose a novel design that eliminates all the off-chip data transfer by GRVs through the reversed shifting of Linear Feedback Shift Registers (LFSRs) without incurring any training accuracy loss. To efficiently support our LFSR reversion strategy at the hardware level, we explore the design space of the current DNN accelerators and identify the optimal computation mapping scheme to best accommodate our strategy. By leveraging this finding, we design and prototype the first highly efficient BNN training accelerator, named Shift-BNN, that is low-cost and scalable. Extensive evaluation on five representative BNN models demonstrates that Shift-BNN achieves an average of 4.9× (up to 10.8×) boost in energy efficiency and 1.6× (up to 2.8×) speedup over the baseline DNN training accelerator. CCS CONCEPTS• Computer systems organization → Neural networks; • Hardware → Hardware accelerators.

show abstract

Section: Evaluation 71 Experimental Methodologymentioning

confidence: 99%

“…DNN training optimization has been extensively studied [39,44,49,60,64]. For example, eager pruning [64] and Procrustes [60] exploit the weight sparsity during the training stage by leveraging aggressive pruning algorithms and develop customized hardware to improve the performance.…”

Section: Related Workmentioning

confidence: 99%

Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving

Wan

Xia

Zhang

et al. 2021

MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

View full text Add to dashboard Cite

show abstract

Section: Accelerators For Dnn Trainingmentioning

confidence: 99%

“…Previous work on accelerating the DNN training has focused on leveraging the sparsity present in weights and activa- tions [11], [33], [44], [45]. TensorDash [33] accelerates the DNN training process while achieving higher energy efficiency via eliminating the ineffectual operations resulted from the sparse input data. Eager Pruning [45] and Procrustes [44] improve DNN training efficiency by co-designing the training algorithm with the target hardware platform ("hardware-aware training").…”

Section: Accelerators For Dnn Trainingmentioning

confidence: 99%

FAST: DNN Training Under Variable Precision Block Floating Point with Stochastic Rounding

Zhang¹,

McDanel²,

Kung³

2021

Preprint

View full text Add to dashboard Cite

Block Floating Point (BFP) can efficiently support quantization for Deep Neural Network (DNN) training by providing a wide dynamic range via a shared exponent across a group of values. In this paper, we propose a Fast First, Accurate Second Training (FAST) system for DNNs, where the weights, activations, and gradients are represented in BFP. FAST supports matrix multiplication with variable precision BFP input operands, enabling incremental increases in DNN precision throughout training. By increasing the BFP precision across both training iterations and DNN layers, FAST can greatly shorten the training time while reducing overall hardware resource usage. Our FAST Multipler-Accumulator (fMAC) supports dot product computations under multiple BFP precisions. We validate our FAST system on multiple DNNs with different datasets, demonstrating a 2-6× speedup in training on a single-chip platform over prior work based on mixed-precision or block floating point number systems while achieving similar performance in validation accuracy.

show abstract

“…Since the sparsification is dynamically changing most of the designs are not efficient to leverage the sparse computation in HASI. In order to address the challenges of dynamic sparsification in HASI we developed a hardware-software codesigned accelerator for a dynamically sparsified model, based on the TensorDash [23] architecture. We call our design a Dynamic sparsified CNN (DySCNN) accelerator.…”

Section: A Noisy Sparsificationmentioning

confidence: 99%

HASI: Hardware-Accelerated Stochastic Inference, A Defense Against Adversarial Machine Learning Attacks

Samavatian¹,

Majumdar²,

Barber³

et al. 2021

Preprint

View full text Add to dashboard Cite

DNNs are known to be vulnerable to so-called adversarial attacks, in which inputs are carefully manipulated to induce misclassification. Existing defenses are mostly softwarebased and come with high overheads or other limitations. This paper presents HASI, a hardware-accelerated defense that uses a process we call stochastic inference to detect adversarial inputs. HASI carefully injects noise into the model at inference time and used the model's response to differentiate adversarial inputs from benign ones. We show an adversarial detection rate of average 87% which exceeds the detection rate of the state of the art approaches, with a much lower overhead. We demonstrate a software/hardware-accelerated co-design, which reduces the performance impact of stochastic inference to 1.58×−2× relative to the unprotected baseline, compared to 14 × −20× overhead for a software-only GPU implementation.

show abstract

TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training

Cited by 52 publications

References 35 publications

Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving

Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving

FAST: DNN Training Under Variable Precision Block Floating Point with Stochastic Rounding

HASI: Hardware-Accelerated Stochastic Inference, A Defense Against Adversarial Machine Learning Attacks

Contact Info

Product

Resources

About