BitPruner: Network Pruning for Bit-serial Accelerators

Zhao, X.; Wang, Ying; Liu, Cheng; Shi, Cong; Tu, Kaijie; Zhang, Lei

doi:10.1109/dac18072.2020.9218534

Cited by 19 publications

(3 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In terms of granularity, accelerators can exploit bit-wise sparsity via bit-serial computation [1,31], unstructured element-wise sparsity of either activations or weights [2,5,6,8,11,20,29,38], or structured sparsity via a co-designed pruning algorithm [17,37,41]. BitPruner [39] applies structured bit-wise pruning to benefit bit-serial architectures. Our approach also falls under the structured pruning category, but with one key distinction: the pruning framework is closely designed with the dataflow.…”

Section: Related Workmentioning

confidence: 99%

Cascading structured pruning

Hanson

et al. 2022

Proceedings of the 49th Annual International Symposium on Computer Architecture

View full text Add to dashboard Cite

Performance and efficiency of running modern Deep Neural Networks (DNNs) are heavily bounded by data movement. To mitigate the data movement bottlenecks, recent DNN inference accelerator designs widely adopt aggressive compression techniques and sparse-skipping mechanisms. These mechanisms avoid transferring or computing with zero-valued weights or activations to save time and energy. However, such sparse-skipping logic involves large input buffers and irregular data access patterns, thus precluding many energy-efficient data reuse opportunities and dataflows. In this work, we propose Cascading Structured Pruning (CSP), a technique that preserves significantly more data reuse opportunities for higher energy efficiency while maintaining comparable performance relative to recent sparse architectures such as SparTen. CSP includes the following two components: At algorithm level, CSP-A induces a predictable sparsity pattern that allows for low-overhead compression of weight data and sequential access to both activation and weight data. At architecture level, CSP-H leverages CSP-A's induced sparsity pattern with a novel dataflow to access unique activation data only once, thus removing the demand for large input buffers. Each CSP-H processing element (PE) employs a novel accumulation buffer design and a counter-based sparse-skipping mechanism to support the dataflow with minimum controller overhead. We verify our approach on several representative models. Our simulated results show that CSP achieves on average 15× energy efficiency improvement over SparTen with comparable or superior speedup under most evaluations.

show abstract

Section: Related Workmentioning

confidence: 99%

Cascading structured pruning

Hanson

et al. 2022

Proceedings of the 49th Annual International Symposium on Computer Architecture

View full text Add to dashboard Cite

show abstract

“…[1]. While training optimizations for such architectures have recently been proposed, they do not fully solve the scheduling issues [16].…”

Section: Shared Weight Bit-sparsitymentioning

confidence: 99%

SWIS -- Shared Weight bIt Sparsity for Efficient Neural Network Acceleration

Li,

Romaszkan,

Graening

et al. 2021

Preprint

View full text Add to dashboard Cite

Quantization is spearheading the increase in performance and efficiency of neural network computing systems making headway into commodity hardware. We present SWIS -Shared Weight bIt Sparsity, a quantization framework for efficient neural network inference acceleration delivering improved performance and storage compression through an offline weight decomposition and scheduling algorithm. SWIS can achieve up to 54.3% (19.8%) point accuracy improvement compared to weight truncation when quantizing MobileNet-v2 to 4 (2) bits post-training (with retraining) showing the strength of leveraging shared bit-sparsity in weights. SWIS accelerator gives up to 6× speedup and 1.9× energy improvement over state of the art bit-serial architectures.

show abstract

“…Deep learning models especially neural networks are known to be fault-tolerant inherently mainly because of the widely utilized activation functions, pooling layers, and the rankingbased outputs that are usually insensitive to computing variations. Many prior work explored the inherent fault tolerance of neural networks for the sake of higher energy efficiency, performance, and memory footprint with approaches like voltage scaling [12] [13], DRAM refresh scaling [14], and low-bit-width quantization [15] [16]. However, the unique fault-tolerant feature does not guarantee fault tolerance against hardware faults and even results in substantial accuracy variation across the different fault configurations according to the investigation in [17] [18] [19], which essentially aggravates the uncertainty of the deep learning processing and hinders the deployment of deep learning in safety-critical applications.…”

Section: Introductionmentioning

confidence: 99%

Fault-Tolerant Deep Learning: A Hierarchical Perspective

Liu¹,

Gao²,

Liu³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

With the rapid advancements of deep learning in the past decade, it can be foreseen that deep learning will be continuously deployed in more and more safety-critical applications such as autonomous driving and robotics. In this context, reliability turns out to be critical to the deployment of deep learning in these applications and gradually becomes a firstclass citizen among the major design metrics like performance and energy efficiency. Nevertheless, the back-box deep learning models combined with the diverse underlying hardware faults make resilient deep learning extremely challenging. In this special session, we conduct a comprehensive survey of fault-tolerant deep learning design approaches with a hierarchical perspective and investigate these approaches from model layer, architecture layer, circuit layer, and cross layer respectively.

show abstract

BitPruner: Network Pruning for Bit-serial Accelerators

Cited by 19 publications

References 10 publications

Cascading structured pruning

Cascading structured pruning

SWIS -- Shared Weight bIt Sparsity for Efficient Neural Network Acceleration

Fault-Tolerant Deep Learning: A Hierarchical Perspective

Contact Info

Product

Resources

About