Minimum energy quantized neural networks

Moons, Bert; Goetschalckx, Koen; Berckelaer, Nick Van; Verhelst, Marian

doi:10.1109/acssc.2017.8335699

Cited by 123 publications

(75 citation statements)

References 9 publications

(14 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The accuracy drop is limited to 3% when running ResNet50 on Imagenet with 2-bit weights and 4-bit activations and to 6.5% when downscaling the weights and activations to 2 bits. Furthermore, the authors of [11] investigated the trade-off between energy efficiency and accuracy of QNNs, highlighting the practical effectiveness of the sub-byte fixed-point networks. At the cost of specific retraining procedures, the accuracy drop of is kept very close to the single-precision floating point counterpart while the 2 The τp thresholds absorb bias, batch normalization and the 2 −2(Q−1) factor.…”

Section: A Quantized Neural Networkmentioning

confidence: 99%

PULP-NN: accelerating quantized neural networks on parallel ultra-low-power RISC-V processors

Garofalo

Rusci

Conti

et al. 2019

Phil. Trans. R. Soc. A.

119

145

View full text Add to dashboard Cite

We present PULP-NN, an optimized computing library for a parallel ultra-low-power tightly coupled cluster of RISC-V processors. The key innovation in PULP-NN is a set of kernels for quantized neural network inference, targeting byte and sub-byte data types, down to INT-1, tuned for the recent trend toward aggressive quantization in deep neural network inference. The proposed library exploits both the digital signal processing extensions available in the PULP RISC-V processors and the cluster’s parallelism, achieving up to 15.5 MACs/cycle on INT-8 and improving performance by up to 63 × with respect to a sequential implementation on a single RISC-V core implementing the baseline RV32IMC ISA. Using PULP-NN, a CIFAR-10 network on an octa-core cluster runs in 30 × and 19.6 × less clock cycles than the current state-of-the-art ARM CMSIS-NN library, running on STM32L4 and STM32H7 MCUs, respectively. The proposed library, when running on a GAP-8 processor, outperforms by 36.8 × and by 7.45 × the execution on energy efficient MCUs such as STM32L4 and high-end MCUs such as STM32H7 respectively, when operating at the maximum frequency. The energy efficiency on GAP-8 is 14.1 × higher than STM32L4 and 39.5 × higher than STM32H7, at the maximum efficiency operating point. This article is part of the theme issue ‘Harmonizing energy-autonomous computing and intelligence’.

show abstract

Section: A Quantized Neural Networkmentioning

confidence: 99%

PULP-NN: accelerating quantized neural networks on parallel ultra-low-power RISC-V processors

Garofalo

Rusci

Conti

et al. 2019

Phil. Trans. R. Soc. A.

119

145

View full text Add to dashboard Cite

show abstract

“…12a shows the setup for offline off-chip training. As the synaptic weights of ODIN have a 3-bit resolution, offline training is carried out with quantization-aware stochastic gradient descent (SGD) following [57], as implemented in [58] using Keras with a TensorFlow backend. The chosen optimizer is Adam, which optimizes the weights by minimizing the categorical cross-entropy loss function during several epochs, each epoch consisting in one presentation of all labeled images in the training set.…”

Section: B Neuron and Synapse Characterizationmentioning

confidence: 99%

A 0.086-mm<formula> <tex>$^2$</tex> </formula> 12.7-pJ/SOP 64k-Synapse 256-Neuron Online-Learning Digital Spiking Neuromorphic Processor in 28nm CMOS

Frenkel

Lefebvre

Legat

et al. 2018

IEEE Trans. Biomed. Circuits Syst.

179

149

View full text Add to dashboard Cite

Shifting computing architectures from von Neumann to event-based spiking neural networks (SNNs) uncovers new opportunities for low-power processing of sensory data in applications such as vision or sensorimotor control. Exploring roads toward cognitive SNNs requires the design of compact, low-power and versatile experimentation platforms with the key requirement of online learning in order to adapt and learn new features in uncontrolled environments. However, embedding online learning in SNNs is currently hindered by high incurred complexity and area overheads. In this work, we present ODIN, a 0.086-mm 2 64ksynapse 256-neuron online-learning digital spiking neuromorphic processor in 28nm FDSOI CMOS achieving a minimum energy per synaptic operation (SOP) of 12.7pJ. It leverages an efficient implementation of the spike-driven synaptic plasticity (SDSP) learning rule for high-density embedded online learning with only 0.68µm 2 per 4-bit synapse. Neurons can be independently configured as a standard leaky integrate-and-fire (LIF) model or as a custom phenomenological model that emulates the 20 Izhikevich behaviors found in biological spiking neurons. Using a single presentation of 6k 16×16 MNIST training images to a single-layer fully-connected 10-neuron network with on-chip SDSP-based learning, ODIN achieves a classification accuracy of 84.5% while consuming only 15nJ/inference at 0.55V using rank order coding. ODIN thus enables further developments toward cognitive neuromorphic devices for low-power, adaptive and lowcost processing.Index Terms-Neuromorphic engineering, spiking neural networks, synaptic plasticity, online learning, Izhikevich behaviors, phenomenological modeling, event-based processing, CMOS digital integrated circuits, low-power design.

show abstract

“…Input images are split with interleaved sub-sampling into four independent 14×14 images. The sub-image pixels are converted to rate-based Poisson-distributed spike trains and sent to four one-hidden-layer fully-connected networks resulting from Adam-based quantizationaware training in Keras following [2], [3]. Layer-wise inhibitory neurons are used to compensate for rescaling of synaptic weights trained with −1 and +1 values in Keras to values of 0 and 1 in MorphIC.…”

Section: Output Classificationmentioning

confidence: 99%

“…A teacher signal is required for supervised online learning, whereas teacher-less learning is unsupervised. operations and minimizing the memory footprint, thus avoiding the high energy cost of off-chip memory accesses if all the weights can be stored into on-chip memory [2]. The accuracy drop induced by quantization can be mitigated to acceptable levels for many applications with the use of quantization-aware training techniques that propagate binary weights during the forward pass and keep full-resolution weights for backpropagation updates [3].…”

mentioning

confidence: 99%

MorphIC: A 65-nm 738k-Synapse/mm$^2$ Quad-Core Binary-Weight Digital Neuromorphic Processor With Stochastic Spike-Driven Online Learning

Frenkel

Legat

Bol

2019

IEEE Trans. Biomed. Circuits Syst.

133

View full text Add to dashboard Cite

Recent trends in the field of neural network accelerators investigate weight quantization as a means to increase the resource-and power-efficiency of hardware devices. As full on-chip weight storage is necessary to avoid the high energy cost of off-chip memory accesses, memory reduction requirements for weight storage pushed toward the use of binary weights, which were demonstrated to have a limited accuracy reduction on many applications when quantization-aware training techniques are used. In parallel, spiking neural network (SNN) architectures are explored to further reduce power when processing sparse eventbased data streams, while on-chip spike-based online learning appears as a key feature for applications constrained in power and resources during the training phase. However, designing power-and area-efficient spiking neural networks still requires the development of specific techniques in order to leverage onchip online learning on binary weights without compromising the synapse density. In this work, we demonstrate MorphIC, a quadcore binary-weight digital neuromorphic processor embedding a stochastic version of the spike-driven synaptic plasticity (S-SDSP) learning rule and a hierarchical routing fabric for large-scale chip interconnection. The MorphIC SNN processor embeds a total of 2k leaky integrate-and-fire (LIF) neurons and more than two million plastic synapses for an active silicon area of 2.86mm 2 in 65nm CMOS, achieving a high density of 738k synapses/mm 2 . MorphIC demonstrates an order-of-magnitude improvement in the area-accuracy tradeoff on the MNIST classification task compared to previously-proposed SNNs, while having no penalty in the energy-accuracy tradeoff.

show abstract

Minimum energy quantized neural networks

Cited by 123 publications

References 9 publications

PULP-NN: accelerating quantized neural networks on parallel ultra-low-power RISC-V processors

PULP-NN: accelerating quantized neural networks on parallel ultra-low-power RISC-V processors

A 0.086-mm<formula> <tex>$^2$</tex> </formula> 12.7-pJ/SOP 64k-Synapse 256-Neuron Online-Learning Digital Spiking Neuromorphic Processor in 28nm CMOS

MorphIC: A 65-nm 738k-Synapse/mm$^2$ Quad-Core Binary-Weight Digital Neuromorphic Processor With Stochastic Spike-Driven Online Learning

Contact Info

Product

Resources

About