A Bop and Beyond: A Second Order Optimizer for Binarized Neural Networks

Suarez-Ramirez, Cuauhtemoc Daniel; González-Mendoza, Miguel; Chang, Leonardo; Ochoa-Ruiz, Gilberto; Duran-Vega, Mario Alberto

doi:10.1109/cvprw53098.2021.00140

Cited by 3 publications

(3 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Yang (2020) takes this one step further and introduces the Filter Gradient descent Framework that can use different types of filters on the noisy gradients to make a better estimation of the true gradient. In binary network optimization, Bop (Helwegen et al, 2019) and its extension (Suarez-Ramirez et al, 2021) introduce a threshold to compare with the smoothed gradient by EMA to determine whether to flip a binary weight. In our paper, we build on second order gradient filtering techniques to reinterpret the hyperparameters that influence the latent weight updates.…”

Section: Related Workmentioning

confidence: 99%

Understanding weight-magnitude hyperparameters in training binary networks

Quist¹,

Li²,

Gemert³

2023

Preprint

View full text Add to dashboard Cite

Binary Neural Networks (BNNs) are compact and efficient by using binary weights instead of real-valued weights. Current BNNs use latent real-valued weights during training, where hyper-parameters are inherited from real-valued networks. The interpretation of several of these hyperparameters is based on the magnitude of the real-valued weights. For BNNs, however, the magnitude of binary weights is not meaningful, and thus it is unclear what these hyperparameters actually do. One example is weight-decay, which aims to keep the magnitude of real-valued weights small. Other examples are latent weight initialization, the learning rate, and learning rate decay, which influence the magnitude of the real-valued weights. The magnitude is interpretable for real-valued weights, but loses its meaning for binary weights. In this paper we offer a new interpretation of these magnitude-based hyperparameters based on higher-order gradient filtering during network optimization. Our analysis makes it possible to understand how magnitude-based hyperparameters influence the training of binary networks which allows for new optimization filters specifically designed for binary neural networks that are independent of their real-valued interpretation. Moreover, our improved understanding reduces the number of hyperparameters, which in turn eases the hyperparameter tuning effort which may lead to better hyperparameter values for improved accuracy.

show abstract

Section: Related Workmentioning

confidence: 99%

Understanding weight-magnitude hyperparameters in training binary networks

Quist¹,

Li²,

Gemert³

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Obviously, this introduces a gradient mismatch between the actual gradient of the [18] 2019 n/a Custom BENN (6x) [55] 2019 61 LC_1 LC_1 N ADAM EN n/a N N Circulant BNN [28] 2019 61.4 PN BN PN N SGD AM PReLU Y N CI-BCNN [46] 2019 59.9 [19] 2020 61.36 [44] 2020 59.7 [29] 2021 n/a PN LB LC_1 TO Y ADAM AM PReLU Y N ReCU [49] 2021 66.4 PN STD LC_A MSTDB N SGD LF PReLU Y N Bop and beyond [41] 2021 n/a Custom Sub-bit BNN [45] 2021 55 In recent years, there have been works that change the clipping interval of the STE. For example, BinaryDenseNet [4] and MeliusNet [3] use an interval of [−1.3, +1.3], whereas PokeBNN [53] uses an interval of [−3, +3].…”

Section: Binarizer (Ste)mentioning

confidence: 99%

“…Next to the default optimizers, [18,41] have developed new optimizers that are dedicated to BNNs, which are respectively called Bop and Bop2ndOrder.…”

Section: Optimizermentioning

confidence: 99%

How to train accurate BNNs for embedded systems?

Putter¹,

Corporaal²

2022

Preprint

View full text Add to dashboard Cite

A key enabler of deploying convolutional neural networks on resourceconstrained embedded systems is the binary neural network (BNN). BNNs save on memory and simplify computation by binarizing both features and weights. Unfortunately, binarization is inevitably accompanied by a severe decrease in accuracy. To reduce the accuracy gap between binary and full-precision networks, many repair methods have been proposed in the recent past, which we have classified and put into a single overview in this chapter. The repair methods are divided into two main branches, training techniques and network topology changes, which can further be split into smaller categories. The latter category introduces additional cost (energy consumption or additional area) for an embedded system, while the former does not. From our overview, we observe that progress has been made in reducing the accuracy gap, but BNN papers are not aligned on what repair methods should be used to get highly accurate BNNs. Therefore, this chapter contains an empirical review that evaluates the benefits of many repair methods in isolation over the ResNet-20&CIFAR10 and ResNet-18&CIFAR100 benchmarks. We found three repair categories most beneficial: feature binarizer, feature normalization, and double residual. Based on this review we discuss future directions and research opportunities. We sketch the benefit and costs associated with BNNs on embedded systems because it remains to be seen whether BNNs will be able to close the accuracy gap while staying highly energy-efficient on resource-constrained embedded systems.

show abstract

How to Train Accurate BNNs for Embedded Systems?

Putter,

Corporaal

2023

Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing

View full text Add to dashboard Cite

A Bop and Beyond: A Second Order Optimizer for Binarized Neural Networks

Cited by 3 publications

References 8 publications

Understanding weight-magnitude hyperparameters in training binary networks

Understanding weight-magnitude hyperparameters in training binary networks

How to train accurate BNNs for embedded systems?

How to Train Accurate BNNs for Embedded Systems?

Contact Info

Product

Resources

About