ChewBaccaNN: A Flexible 223 TOPS/W BNN Accelerator

Andri, Renzo; Karunaratne, Geethan; Cavigelli, Lukas; Benini, Luca

doi:10.1109/iscas51556.2021.9401214

Cited by 13 publications

(11 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is clear that parallelism and data reuse (either in the form of locally buffering or by broadcasting) are the keys to amortizing the memory access cost, which is so much larger than the low-precision arithmetic cost. Techniques to mitigate these costs are to replace SRAM with low-voltage SCM, hard-wire network parameters to enable broadcasting, and use the sliding window principle (like the FMM banks in combination with the crossbar in ChewBaccaNN [1]). In essence, all these solutions boil down to designing the architecture around the data movements in a less-flexible manner.…”

Section: Comparison and Discussionmentioning

confidence: 99%

“…A full system-on-a-chip (SoC), implemented in 22nm technology is presented including the accelerator, RISC host processor, and peripherals. • ChewBaccaNN [1] is an architecture for binary neural network inference that exploits efficient data reuse by co-designing the memory hierarchy with the neural network ran on the architecture. The hard-wired kernel size allows efficient data reuse.…”

Section: Five Low-and Mixed-precision Accelerators Reviewedmentioning

confidence: 99%

“…ChewBaccaNN [1] is like XNE an accelerator utilizing binary weights and binary activations. Contrary to XNE, this architecture does not implement a full SoC and is therefore purely based on the accelerator core.…”

Section: Chewbaccannmentioning

confidence: 99%

See 2 more Smart Citations

Low- and Mixed-Precision Inference Accelerators

Molendijk¹,

Putter²,

Corporaal³

2022

Preprint

View full text Add to dashboard Cite

With the surging popularity of edge computing, the need to efficiently perform neural network inference on battery-constrained IoT devices has greatly increased. While algorithmic developments enable neural networks to solve increasingly more complex tasks, the deployment of these networks on edge devices can be problematic due to the stringent energy, latency, and memory requirements. One way to alleviate these requirements is by heavily quantizing the neural network, i.e. lowering the precision of the operands. By taking quantization to the extreme, e.g. by using binary values, new opportunities arise to increase the energy efficiency. Several hardware accelerators exploiting the opportunities of low-precision inference have been created, all aiming at enabling neural network inference at the edge. In this chapter, design choices and their implications on the flexibility and energy efficiency of several accelerators supporting extremely quantized networks are reviewed.

show abstract

Section: Comparison and Discussionmentioning

confidence: 99%

Section: Five Low-and Mixed-precision Accelerators Reviewedmentioning

confidence: 99%

See 1 more Smart Citation

Low- and Mixed-Precision Inference Accelerators

Molendijk¹,

Putter²,

Corporaal³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Replacing a real-valued MAC by XNOR-PopCount is very good for energy efficiency. E.g., in [2] a BNN accelerator is presented that achieves down to merely 4.48 fJ/Op in GF 22nm at 0.4V, where Op is a binary operation (xnor or popcount). The most common network architecture used in BNN papers is illustrated in Fig.…”

Section: Inferencementioning

confidence: 99%

“…Similar to ABCNet [26], a weighted sum is used to get to the original feature size. Compared to the per-layer ensemble this ensemble expands the network by a factor 𝑁 rather than 𝑁 2 .…”

Section: Ensemblementioning

confidence: 99%

How to train accurate BNNs for embedded systems?

Putter¹,

Corporaal²

2022

Preprint

View full text Add to dashboard Cite

A key enabler of deploying convolutional neural networks on resourceconstrained embedded systems is the binary neural network (BNN). BNNs save on memory and simplify computation by binarizing both features and weights. Unfortunately, binarization is inevitably accompanied by a severe decrease in accuracy. To reduce the accuracy gap between binary and full-precision networks, many repair methods have been proposed in the recent past, which we have classified and put into a single overview in this chapter. The repair methods are divided into two main branches, training techniques and network topology changes, which can further be split into smaller categories. The latter category introduces additional cost (energy consumption or additional area) for an embedded system, while the former does not. From our overview, we observe that progress has been made in reducing the accuracy gap, but BNN papers are not aligned on what repair methods should be used to get highly accurate BNNs. Therefore, this chapter contains an empirical review that evaluates the benefits of many repair methods in isolation over the ResNet-20&CIFAR10 and ResNet-18&CIFAR100 benchmarks. We found three repair categories most beneficial: feature binarizer, feature normalization, and double residual. Based on this review we discuss future directions and research opportunities. We sketch the benefit and costs associated with BNNs on embedded systems because it remains to be seen whether BNNs will be able to close the accuracy gap while staying highly energy-efficient on resource-constrained embedded systems.

show abstract