Neural Network Distillation on IoT Platforms for Sound Event Detection

Cerutti, G.; Prasad, Rahul; Brutti, Alessio; Farella, Elisabetta

doi:10.21437/interspeech.2019-2394

Cited by 24 publications

(16 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On the other side, with the advent of deep learning techniques, machine learning algorithms' size grows exponentially, thanks to the improvements in processor speeds and the availability of large training data. However, embedded systems cannot sustain the resource requirements of standard deep learning techniques, adequate for GP-GPUs [6,14,33].…”

Section: Introductionmentioning

confidence: 99%

“…TinyML has been applied to several diferent classes of problems and devices, such as audio processing and sound event detection [6] [38], biosignals processing [21], gesture recognition [43] and general time series data [15]. Among the several application domains where to explore this novel trend, computer vision is one of the optimisation since it accounts for the most considerable computational cost in network inference.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

PhiNets: A Scalable Backbone for Low-power AI at the Edge

Paissan

Ancilotto

Farella

2022

ACM Trans. Embed. Comput. Syst.

Self Cite

View full text Add to dashboard Cite

In the Internet of Things era, where we see many interconnected and heterogeneous mobile and fixed smart devices, distributing the intelligence from the cloud to the edge has become a necessity. Due to limited computational and communication capabilities, low memory and limited energy budget, bringing artificial intelligence algorithms to peripheral devices, such as end-nodes of a sensor network, is a challenging task and requires the design of innovative solutions. In this work, we present PhiNets , a new scalable backbone optimized for deep-learning-based image processing on resource-constrained platforms. PhiNets are based on inverted residual blocks specifically designed to decouple the computational cost, working memory, and parameter memory, thus exploiting all available resources for a given platform. With a YoloV2 detection head and Simple Online and Realtime Tracking, the proposed architecture achieves state-of-the-art results in (i) detection on the COCO and VOC2012 benchmarks, and (ii) tracking on the MOT15 benchmark. PhiNets obtain a reduction in parameter count of around 90% with respect to previous state-of-the-art models (EfficientNetv1, MobileNetv2) and achieve better performance with lower computational cost. Moreover, we demonstrate our approach on a prototype node based on an STM32H743 microcontroller (MCU) with 2MB of internal Flash and 1MB of RAM and achieve power requirements in the order of 10 mW. The code for the PhiNets is publicly available on GitHub.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

PhiNets: A Scalable Backbone for Low-power AI at the Edge

Paissan

Ancilotto

Farella

2022

ACM Trans. Embed. Comput. Syst.

Self Cite

View full text Add to dashboard Cite

show abstract

Section: Introductionmentioning

confidence: 99%

PhiNets: a scalable backbone for low-power AI at the edge

Paissan¹,

Ancilotto²,

Farella³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

In the Internet of Things era, where we see many interconnected and heterogeneous mobile and fixed smart devices, distributing the intelligence from the cloud to the edge has become a necessity. Due to limited computational and communication capabilities, low memory and limited energy budget, bringing artificial intelligence algorithms to peripheral devices, such as the end-nodes of a sensor network, is a challenging task and requires the design of innovative methods. In this work, we present PhiNets, a new scalable backbone optimized for deep-learning-based image processing on resource-constrained platforms. PhiNets are based on inverted residual blocks specifically designed to decouple the computational cost, working memory, and parameter memory, thus exploiting all the available resources. With a YoloV2 detection head and Simple Online and Realtime Tracking, the proposed architecture has achieved the state-of-the-art results in (i) detection on the COCO and VOC2012 benchmarks, and (ii) tracking on the MOT15 benchmark. PhiNets reduce the parameter count of 87% to 93% with respect to previous state-of-the-art models (EfficientNetv1, MobileNetv2) and achieve better performance with lower computational cost. Moreover, we demonstrate our approach on a prototype node based on a STM32H743 microcontroller (MCU) with 2MB of internal Flash and 1MB of RAM and achieve power requirements in the order of 10 mW. The code for the PhiNets is publicly available on GitHub 1 .

show abstract

“…Going from state-of-the-art neural models to actual implementation on an IoT device involves multiple stages, as depicted in Fig 1 . In our previous publication [14], we presented a KD approach to compress a SED classifier composed of the publicly available VGGish feature extractor [15] and arXiv:2001.10876v1 [eess.AS] 29 Jan 2020 a recurrent classifier. Differently from common applications of KD, aimed at improving performance or at achieving limited reductions of the model dimensions, we obtained very high compression factors, reducing the network size from approximately 70 million parameters to nearly 20 thousand.…”

Section: Introductionmentioning

confidence: 99%

“…In this paper, we focus in particular on i) a preliminary analysis of the computational and memory requirements to understand what kind of models can be afforded by a given class of microcontrollers; ii) the quantization of the network parameters and the activations, presenting two different strategies to select for each layer the best fixed-point representation; iii) an implementation of the reduced network on a microcontroller with resources typical of an IoT endnode, building upon the network reduction strategies presented in [14] and iv) the evaluation of the accuracy of the actual implementation. In addition, we present an improvement of the KD approach presented in [14]. Distillation is performed in two stages where adaptation of the VGGish pre-trained feature extraction to the in-domain data is separated from the actual parameter distillation, leading to a further improvement of the classification accuracy.…”

Section: Introductionmentioning

confidence: 99%

Compact Recurrent Neural Networks for Acoustic Event Detection on Low-Energy Low-Complexity Platforms

Cerutti

Prasad

Brutti

et al. 2020

IEEE J. Sel. Top. Signal Process.

Self Cite

View full text Add to dashboard Cite

Outdoor acoustic event detection is an exciting research field but challenged by the need for complex algorithms and deep learning techniques, typically requiring many computational, memory, and energy resources. These challenges discourage IoT implementations, where an efficient use of resources is required. However, current embedded technologies and microcontrollers have increased their capabilities without penalizing energy efficiency. This paper addresses the application of sound event detection at the very edge, by optimizing deep learning techniques on resource-constrained embedded platforms for the IoT. The contribution is two-fold: firstly, a two-stage student-teacher approach is presented to make state-of-theart neural networks for sound event detection fit on current microcontrollers; secondly, we test our approach on an ARM Cortex M4, particularly focusing on issues related to 8-bits quantization. Our embedded implementation can achieve 68% accuracy in recognition on Urbansound8k, not far from state-ofthe-art performance, with an inference time of 125 ms for each second of the audio stream, and power consumption of 5.5 mW in just 34.3 kB of RAM.

show abstract

Neural Network Distillation on IoT Platforms for Sound Event Detection

Cited by 24 publications

References 18 publications

PhiNets: A Scalable Backbone for Low-power AI at the Edge

PhiNets: A Scalable Backbone for Low-power AI at the Edge

PhiNets: a scalable backbone for low-power AI at the edge

Compact Recurrent Neural Networks for Acoustic Event Detection on Low-Energy Low-Complexity Platforms

Contact Info

Product

Resources

About