Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-2394
|View full text |Cite
|
Sign up to set email alerts
|

Neural Network Distillation on IoT Platforms for Sound Event Detection

Abstract: In most classification tasks, wide and deep neural networks perform and generalize better than their smaller counterparts, in particular when they are exposed to large and heterogeneous training sets. However, in the emerging field of Internet of Things memory footprint and energy budget pose severe limits on the size and complexity of the neural models that can be implemented on embedded devices. The Student-Teacher approach is an attractive strategy to distill knowledge from a large network into smaller ones… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
15
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
2

Relationship

4
4

Authors

Journals

citations
Cited by 24 publications
(16 citation statements)
references
References 18 publications
0
15
0
Order By: Relevance
“…On the other side, with the advent of deep learning techniques, machine learning algorithms' size grows exponentially, thanks to the improvements in processor speeds and the availability of large training data. However, embedded systems cannot sustain the resource requirements of standard deep learning techniques, adequate for GP-GPUs [6,14,33].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…On the other side, with the advent of deep learning techniques, machine learning algorithms' size grows exponentially, thanks to the improvements in processor speeds and the availability of large training data. However, embedded systems cannot sustain the resource requirements of standard deep learning techniques, adequate for GP-GPUs [6,14,33].…”
Section: Introductionmentioning
confidence: 99%
“…TinyML has been applied to several diferent classes of problems and devices, such as audio processing and sound event detection [6] [38], biosignals processing [21], gesture recognition [43] and general time series data [15]. Among the several application domains where to explore this novel trend, computer vision is one of the optimisation since it accounts for the most considerable computational cost in network inference.…”
Section: Introductionmentioning
confidence: 99%
“…On the other side, with the advent of deep learning techniques, machine learning algorithms' size grows exponentially, thanks to the improvements in processor speeds and the availability of large training data. However, embedded systems cannot sustain the resource requirements of standard deep learning techniques, adequate for GP-GPUs [4,11,25].…”
Section: Introductionmentioning
confidence: 99%
“…Going from state-of-the-art neural models to actual implementation on an IoT device involves multiple stages, as depicted in Fig 1 . In our previous publication [14], we presented a KD approach to compress a SED classifier composed of the publicly available VGGish feature extractor [15] and arXiv:2001.10876v1 [eess.AS] 29 Jan 2020 a recurrent classifier. Differently from common applications of KD, aimed at improving performance or at achieving limited reductions of the model dimensions, we obtained very high compression factors, reducing the network size from approximately 70 million parameters to nearly 20 thousand.…”
Section: Introductionmentioning
confidence: 99%
“…In this paper, we focus in particular on i) a preliminary analysis of the computational and memory requirements to understand what kind of models can be afforded by a given class of microcontrollers; ii) the quantization of the network parameters and the activations, presenting two different strategies to select for each layer the best fixed-point representation; iii) an implementation of the reduced network on a microcontroller with resources typical of an IoT endnode, building upon the network reduction strategies presented in [14] and iv) the evaluation of the accuracy of the actual implementation. In addition, we present an improvement of the KD approach presented in [14]. Distillation is performed in two stages where adaptation of the VGGish pre-trained feature extraction to the in-domain data is separated from the actual parameter distillation, leading to a further improvement of the classification accuracy.…”
Section: Introductionmentioning
confidence: 99%