Fixed point optimization of deep convolutional neural networks for object recognition

Anwar, Sajid; Hwang, Kyuyeon; Sung, Wonyong

doi:10.1109/icassp.2015.7178146

Cited by 186 publications

(114 citation statements)

References 11 publications

Supporting

Mentioning

103

Contrasting

Order By: Relevance

“…Seide et al showed how to quantize the gradients using only one bit per value to train a DNN using Stochastic Gradient Descent without loss of accuracy [35]. Anwar et al also consider using di↵erent precisions per-layer for LeNet and Convnet [2]. They use linear quantization and retraining to achieve smaller precisions and improve the accuracy of the network.…”

Section: Related Workmentioning

confidence: 99%

“…Recently, Judd et al and Anwar et al observed that the required representation length varies significantly across DNN layers with the worst-case representation length being much longer than the average length needed [21,2]. This behavior persists across multiple data inputs suggesting that a perlayer choice can be made safely without sacrificing overall accuracy.…”

Section: Introductionmentioning

confidence: 97%

See 1 more Smart Citation

Proteus

Judd

Albericio

Hetherington

et al. 2016

Proceedings of the 2016 International Conference on Supercomputing

View full text Add to dashboard Cite

This work exploits the tolerance of Deep Neural Networks (DNNs) to reduced precision numerical representations and specifically, their recently demonstrated ability to tolerate representations of di↵erent precision per layer while maintaining accuracy. This flexibility enables improvements over conventional DNN implementations that use a single, uniform representation. This work proposes Proteus, which reduces the data tra c and storage footprint needed by DNNs, resulting in reduced energy and improved area efficiency for DNN implementations. Proteus uses a di↵er-ent representation per layer for both the data (neurons) and the weights (synapses) processed by DNNs. Proteus is a layered extension over existing DNN implementations that converts between the numerical representation used by the DNN execution engines and the shorter, layer-specific fixedpoint representation used when reading and writing data values to memory be it on-chip bu↵ers or o↵-chip memory.Proteus uses a novel memory layout for DNN data, enabling a simple, low-cost and low-energy conversion unit.We evaluate Proteus as an extension to a state-of-the-art accelerator [7] which uses a uniform 16-bit fixed-point representation. On five popular DNNs Proteus reduces data tra c among layers by 43% on average while maintaining accuracy within 1% even when compared to a single precision floating-point implementation. As a result, Proteus improves energy by 15% with no performance loss. Proteus also reduces the data footprint by at least 38% and hence the amount of on-chip bu↵ering needed resulting in an implementation that requires 20% less area overall. This area savings can be used to improve cost by building smaller chips, to process larger DNNs for the same on-chip area, or to incorporate an additional three execution engines increasing peak performance bandwidth by 18%.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 97%

Proteus

Judd

Albericio

Hetherington

et al. 2016

Proceedings of the 2016 International Conference on Supercomputing

View full text Add to dashboard Cite

show abstract

“…Thus, it is to no surprise that several techniques exist for "pruning" or "sparsifying" CNNs (i.e., forcing some model weights to 0) to both compress the model and to save computations during inference. Examples of these techniques include: iterative pruning and retraining ( [3,9,4,27,20]), Huffman coding ( [6]), exploiting granularity ( [18,5]), structural pruning of network connections ( [32,19,1,22]), and Knowledge Distillation (KD) ( [11]).…”

Section: Introductionmentioning

confidence: 99%

“…In this paper, we seek to answer some fundamental questions that relate to the trade-off between sparsity and inference accuracy: (1) To what extent a CNN can be sparsified without retraining while maintaining a reasonable inference accuracy, (2) What are good model-independent methods for sparsifying CNNs, and (3) Can such sparsification benefit from autotuning [8,2]. We focus on sparsification leaving the actual exploitation of the resulting sparsity to future work.…”

Section: Introductionmentioning

confidence: 99%

Retraining-free methods for fast on-the-fly pruning of convolutional neural networks

2019

View full text Add to dashboard Cite

Modern Convolutional Neural Networks (CNNs) are complex, encompassing millions of parameters. Their deployment exerts computational, storage and energy demands, particularly on embedded platforms. Existing approaches to prune or sparsify CNNs require retraining to maintain inference accuracy. Such retraining is not feasible in some contexts. In this paper, we explore the sparsification of CNNs by proposing three model-independent methods. Our methods are applied on-thefly and require no retraining. We show that the state-of-the-art models' weights can be reduced by up to 73% (compression factor of 3.7×) without incurring more than 5% loss in Top-5 accuracy. Additional fine-tuning gains only 8% in sparsity, which indicates that our fast on-the-fly methods are effective.

show abstract

“…Retrain each BNN for maximally M epochs. 2 Ensemble Pass: 3 for k=1 to K do 4 Sampling a new training set given weight u i of each example i; 5 for epoch=1 to M do 6 Forward Pass: 7 for l=1 to L do 8 for each filter in l-th layer do…”

mentioning

confidence: 99%

Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit?

Zhu

Dong

2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

120

View full text Add to dashboard Cite

Binary neural networks (BNN) have been studied extensively since they run dramatically faster at lower memory and power consumption than floating-point networks, thanks to the efficiency of bit operations. However, contemporary BNNs whose weights and activations are both single bits suffer from severe accuracy degradation. To understand why, we investigate the representation ability, speed and bias/variance of BNNs through extensive experiments. We conclude that the error of BNNs are predominantly caused by the intrinsic instability (training time) and non-robustness (train & test time). Inspired by this investigation, we propose the Binary Ensemble Neural Network (BENN) which leverages ensemble methods to improve the performance of BNNs with limited efficiency cost. While ensemble techniques have been broadly believed to be only marginally helpful for strong classifiers such as deep neural networks, our analysis and experiments show that they are naturally a perfect fit to boost BNNs. We find that our BENN, which is faster and more robust than state-of-the-art binary networks, can even surpass the accuracy of the full-precision floating number network with the same architecture.

show abstract

Fixed point optimization of deep convolutional neural networks for object recognition

Cited by 186 publications

References 11 publications

Proteus

Proteus

Retraining-free methods for fast on-the-fly pruning of convolutional neural networks

Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit?

Contact Info

Product

Resources

About