PokeBNN: A Binary Pursuit of Lightweight Accuracy

Zhang, Yichi; Zhang, Zhiru; Lew, Łukasz

doi:10.48550/arxiv.2112.00133

Cited by 3 publications

(4 citation statements)

References 33 publications

(51 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The research in BNNs is focussed on bringing deep learning to resourceconstrained edge devices. Recent studies report the computational complexity of their models using theoretical metrics such as floating-point operations (FLOPs) [24,27] multiplyaccumulate (MACs) [4] or arithmetic computation effort (ACE) [40]. In coherence with [3,30] we argue that latency is the best metric to compare model performances.…”

Section: Methodsmentioning

confidence: 99%

LAB: Learnable Activation Binarizer for Binary Neural Networks

Falkena

Jamali-Rad

Gemert

2023

2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

View full text Add to dashboard Cite

Binary Neural Networks (BNNs) are receiving an upsurge of attention for bringing power-hungry deep learning towards edge devices. The traditional wisdom in this space is to employ sign(.) for binarizing feature maps. We argue and illustrate that sign(.) is a uniqueness bottleneck, limiting information propagation throughout the network. To alleviate this, we propose to dispense sign(.), replacing it with a learnable activation binarizer (LAB), allowing the network to learn a fine-grained binarization kernel per layer -as opposed to global thresholding. LAB is a novel universal module that can seamlessly be integrated into existing architectures. To confirm this, we plug it into four seminal BNNs and show a considerable accuracy boost at the cost of tolerable increase in delay and complexity. Finally, we build an end-to-end BNN (coined as LAB-BNN) around LAB, and demonstrate that it achieves competitive performance on par with the state-of-the-art on ImageNet. Our code can be found in our repository: https://github.com/sfalkena/LAB .

show abstract

Section: Methodsmentioning

confidence: 99%

LAB: Learnable Activation Binarizer for Binary Neural Networks

Falkena

Jamali-Rad

Gemert

2023

2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

View full text Add to dashboard Cite

show abstract

“…Note that while the compute cost of a SE-like module is usually considered to be negligible, its parameter size cannot be ignored in BNN models. We calculated total number of opertations (OPs) as OPs = FLOPs + (BOPs / 64) + (int4 OPs / 16), following [18,19,32]. In case of parameters, binary weights are 1-bit, weights of SE-like modules are 8-bit, and other real-valued parameters and weights are considered as 32-bit.…”

Section: Cost Analysismentioning

confidence: 99%

INSTA-BNN: Binary Neural Network with INSTAnce-aware Threshold

Lee¹,

Park²,

Kim³

2022

Preprint

View full text Add to dashboard Cite

Binary Neural Networks (BNNs) have emerged as a promising solution for reducing the memory footprint and compute costs of deep neural networks. BNNs, on the other hand, suffer from information loss because binary activations are limited to only two values, resulting in reduced accuracy. To improve the accuracy, previous studies have attempted to control the distribution of binary activation by manually shifting the threshold of the activation function or making the shift amount trainable. During the process, they usually depended on statistical information computed from a batch. We argue that using statistical data from a batch fails to capture the crucial information for each input instance in BNN computations, and the differences between statistical information computed from each instance need to be considered when determining the binary activation threshold of each instance. Based on the concept, we propose the Binary Neural Network with INSTAnceaware threshold (INSTA-BNN), which decides the activation threshold value considering the difference between statistical data computed from a batch and each instance. The proposed INSTA-BNN outperforms the baseline by 2.5% and 2.3% on the ImageNet classification task with comparable computing cost, achieving 68.0% and 71.7% top-1 accuracy on ResNet-18 and MobileNetV1 based models, respectively.

show abstract

“…We use Arithmetic Computation Effort (ACE) (Zhang et al, 2021), a newly proposed hardware-and energy-inspired cost metric to evaluate the inference cost of quantized BERT models. ACE is defined as Table 1: We quantize 32-bit baseline models to 8-bits by three quantization methods.…”

Section: Downstream Language Tasks and Evaluation Metricsmentioning

confidence: 99%

“…I and J are sets of all quantization bits used for inference. ACE is shown to be well correlated to the actual energy consumption on Google TPUs hardware and used to evaluate the inference cost of Binary Neural Networks in Zhang et al (2021).…”

Section: Downstream Language Tasks and Evaluation Metricsmentioning

confidence: 99%

Empirical Evaluation of Post-Training Quantization Methods for Language Tasks

Hu¹,

Meinel²,

Yang³

2022

Preprint

View full text Add to dashboard Cite

Transformer-based architectures like BERT have achieved great success in a wide range of Natural Language tasks. Despite their decent performance, the models still have numerous parameters and high computational complexity, impeding their deployment in resource-constrained environments.Post-Training Quantization (PTQ), which enables low-bit computations without extra training, could be a promising tool. In this work, we conduct an empirical evaluation of three PTQ methods on BERT-Base and BERT-Large: Linear Quantization (LQ), Analytical Clipping for Integer Quantization (ACIQ), and Outlier Channel Splitting (OCS). OCS theoretically surpasses the others in minimizing the Mean Square quantization Error and avoiding distorting the weights' outliers. That is consistent with the evaluation results of most language tasks of GLUE benchmark and a reading comprehension task, SQuAD. Moreover, lowbit quantized BERT models could outperform the corresponding 32-bit baselines on several small language tasks, which we attribute to the alleviation of over-parameterization. We further explore the limit of quantization bit and show that OCS could quantize BERT-Base and BERT-Large to 3-bits and retain 98% and 96% of the performance on the GLUE benchmark accordingly. Moreover, we conduct quantization on the whole BERT family, i.e., BERT models in different configurations, and comprehensively evaluate their performance on the GLUE benchmark and SQuAD, hoping to provide valuable guidelines for their deployment in various computation environments.

show abstract

PokeBNN: A Binary Pursuit of Lightweight Accuracy

Cited by 3 publications

References 33 publications

LAB: Learnable Activation Binarizer for Binary Neural Networks

LAB: Learnable Activation Binarizer for Binary Neural Networks

INSTA-BNN: Binary Neural Network with INSTAnce-aware Threshold

Empirical Evaluation of Post-Training Quantization Methods for Language Tasks

Contact Info

Product

Resources

About