Training Binary Weight Networks via Semi-Binary Decomposition

Hu, Qinghao; Li, Gang; Wang, Peisong; Zhang, Yifan; Cheng, Jian

doi:10.1007/978-3-030-01261-8_39

Cited by 14 publications

(5 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the SOTA neural network models suffer massive parameters and large sizes to achieve good performance in different tasks, which also cause significant complex computation and great resource consumption. To compress and accelerate the deep CNNs, many approaches have been proposed, which can be classified into five categories: transferred/compact convolutional filters [89,85,78]; quantization/binarization [35,11,82,92]; knowledge distillation [12,86,16]; pruning [28,31,22]; low-rank factorization [46,38,47,79].…”

Section: Related Workmentioning

confidence: 99%

Forward and Backward Information Retention for Accurate Binary Neural Networks

Qin

Gong

Liu

et al. 2020

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

246

234

View full text Add to dashboard Cite

Model binarization is an effective method of compressing neural networks and accelerating their inference process, which enables state-of-the-art models to run on resource-limited devices. Recently, advanced binarization methods have been greatly improved by minimizing the quantization error directly in the forward process. However, a significant performance gap still exists between the 1-bit model and the 32-bit one. The empirical study shows that binarization causes a great loss of information in the forward and backward propagation which harms the performance of binary neural networks (BNNs), and the limited information representation ability of binarized parameter is one of the bottlenecks of BNN performance. We present a novel Distributionsensitive Information Retention Network (DIR-Net) to retain the information of the forward activations and backward gradients, which improves BNNs

show abstract

Section: Related Workmentioning

confidence: 99%

Forward and Backward Information Retention for Accurate Binary Neural Networks

Qin

Gong

Liu

et al. 2020

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

246

234

View full text Add to dashboard Cite

show abstract

“…parameter quantizing [32,33,34,35,36,37,38,39,40,41], low-rank parameter factorization [42,43,44,45,46], transferred/compact convolutional filters [47,48,49,50], and knowledge distillation [51,52,53,54,55,56]. The parameter pruning and quantizing mainly focus on eliminating the redundancy in the model parameters respectively by removing the redundant/uncritical ones or compressing the parameter space (e.g.…”

Section: Introductionmentioning

confidence: 99%

“…They visualize the feature maps extracted by different filters and view each filter as a visual unit focusing on different visual components.of the ResNet-50 [28], and meanwhile save more than 75% of parameters and 50% computational time. In the literature, approaches for compressing the deep networks can be classified into five categories: parameter pruning [26,29,30,31], parameter quantizing [32,33,34,35,36,37,38,39,40,41], low-rank parameter factorization [42,43,44,45,46], transferred/compact convolutional filters [47,48,49,50], and knowledge distillation [51,52,53,54,55,56]. The parameter pruning and quantizing mainly focus on eliminating the redundancy in the model parameters respectively by removing the redundant/uncritical ones or compressing the parameter space (e.g.…”

mentioning

confidence: 99%

Binary neural networks: A survey

Qin

Gong

Liu

et al. 2020

Pattern Recognition

389

171

View full text Add to dashboard Cite

The binary neural network, largely saving the storage and computation, serves as a promising technique for deploying deep models on resource-limited devices.However, the binarization inevitably causes severe information loss, and even worse, its discontinuity brings difficulty to the optimization of the deep network.To address these issues, a variety of algorithms have been proposed, and achieved satisfying progress in recent years. In this paper, we present a comprehensive survey of these algorithms, mainly categorized into the native solutions directly conducting binarization, and the optimized ones using techniques like minimizing the quantization error, improving the network loss function, and reducing the gradient error. We also investigate other practical aspects of binary neural networks such as the hardware-friendly design and the training tricks. Then, we give the evaluation and discussions on different tasks, including image classification, object detection and semantic segmentation. Finally, the challenges that may be faced in future research are prospected.the heavy computation and storage still inevitably limit the applications of the deep CNNs in practice. Besides, due to the huge model parameter space, the prediction of the neural networks is usually viewed as a black-box, which brings great challenges to the interpretability of CNNs. Some works like [21,22,23] empirically explore the function of each layer in the network. They visualize the feature maps extracted by different filters and view each filter as a visual unit focusing on different visual components.of the ResNet-50 [28], and meanwhile save more than 75% of parameters and 50% computational time. In the literature, approaches for compressing the deep networks can be classified into five categories: parameter pruning [26,29,30,31], parameter quantizing [32,33,34,35,36,37,38,39,40,41], low-rank parameter factorization [42,43,44,45,46], transferred/compact convolutional filters [47,48,49,50], and knowledge distillation [51,52,53,54,55,56]. The parameter pruning and quantizing mainly focus on eliminating the redundancy in the model parameters respectively by removing the redundant/uncritical ones or compressing the parameter space (e.g. , from the floating-point weights to the integer ones). Low-rank factorization applies the matrix/tensor decomposition techniques to estimate the informative parameters using the proxy ones of small size. The compact convolutional filter based approaches rely on the carefullydesigned structural convolutional filters to reduce the storage and computation complexity. The knowledge distillation methods try to distill a more compact model to reproduce the output of a larger network.Among the existing network compression techniques, quantization based one serves as a promising and fast solution that yields highly compact models compared to their floating-point counterparts, by representing the network weights with very low precision. Along this direction, the most extreme quantization is binarization, the interest...

show abstract

“…By using FP16, the final accuracy is determined by the selected network and the corresponding BWN training algorithm. A ResNet-18 trained on the ImageNet dataset can run on Hyperdrive with a 87.1% top-5 accuracy using the SBD-FQ training method [55] (full-precision top-5 accuracy: 89.2%).…”

Section: Resultsmentioning

confidence: 99%

Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine

Andri

Cavigelli

Rossi

et al. 2019

IEEE J. Emerg. Sel. Topics Circuits Syst.

View full text Add to dashboard Cite

Deep neural networks have achieved impressive results in computer vision and machine learning. Unfortunately, state-of-the-art networks are extremely compute and memory intensive which makes them unsuitable for mW-devices such as IoT end-nodes. Aggressive quantization of these networks dramatically reduces the computation and memory footprint. Binary-weight neural networks (BWNs) follow this trend, pushing weight quantization to the limit. Hardware accelerators for BWNs presented up to now have focused on core efficiency, disregarding I/O bandwidth and system-level efficiency that are crucial for deployment of accelerators in ultra-low power devices. We present Hyperdrive: a BWN accelerator dramatically reducing the I/O bandwidth exploiting a novel binary-weight streaming approach, which can be used for arbitrarily sized convolutional neural network architecture and input resolution by exploiting the natural scalability of the compute units both at chip-level and system-level by arranging Hyperdrive chips systolically in a 2D mesh while processing the entire feature map together in parallel. Hyperdrive achieves 4.3 TOp/s/W systemlevel efficiency (i.e., including I/Os)-3.1× higher than state-ofthe-art BWN accelerators, even if its core uses resource-intensive FP16 arithmetic for increased robustness.

show abstract

Training Binary Weight Networks via Semi-Binary Decomposition

Cited by 14 publications

References 18 publications

Forward and Backward Information Retention for Accurate Binary Neural Networks

Forward and Backward Information Retention for Accurate Binary Neural Networks

Binary neural networks: A survey

Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine

Contact Info

Product

Resources

About