Evaluation of Convolution Primitives for Embedded Neural Networks on 32-Bit Microcontrollers

Nguyen, Baptiste; Moëllic, Pierre-Alain; Blayac, S.

doi:10.1007/978-3-031-27440-4_41

Cited by 2 publications

(1 citation statement)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…B. Related work a) Deployment and optimisation frameworks for MCUs: Nguyen et al [17] assemble a state-of-the-art family using the open-source NNoM deployment framework. They perform an experimental characterisation of convolution operator implementations and observe a linear relationship between theoretical multiply-accumulate operations (MACs) and energy consumption, highlighting the benefits of using computationally efficient primitives such as shift convolution.…”

Section: A Backgroundmentioning

confidence: 99%

Optimizing Convolutions for Deep Learning Inference on ARM Cortex-M Processors

Maciá-Lillo,

Barrachina,

Fabregat

et al. 2024

IEEE Internet Things J.

View full text Add to dashboard Cite

We perform a series of optimisations on the convolution operator within the ARM CMSIS-NN library to improve the performance of deep learning tasks on Arduino development boards equipped with ARM Cortex-M4 and M7 microcontrollers. To this end, we develop custom microkernels that efficiently handle the internal computations required by the convolution operator via the lowering approach and the direct method, and we design two techniques to avoid register spilling. We also take advantage of all the RAM on the Arduino boards by reusing it as a scratchpad for the convolution filters. The integration of these techniques into CMSIS-NN, when invoked by TensorFlow Lite for microcontrollers for quantised versions of VGG, SqueezeNet, ResNet, and MobileNet-like convolutional neural networks enhances the overall inference speed by a factor ranging from 1.13× to 1.50×.

show abstract

Section: A Backgroundmentioning

confidence: 99%