MSFF-Net: Multi-scale feature fusing networks with dilated mixed convolution and cascaded parallel framework for sound event detection

Wang, Yingbin; Zhao, Guanghui; Xiong, Kai; Shi, Guangming

doi:10.1016/j.dsp.2021.103319

Cited by 7 publications

(1 citation statement)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Currently, with the tremendous improvement in computing power and data resources, deep learning-based methods have become the mainstream approach to SED tasks. For instance, the multi-scale feature fusing networks (MFFNs) [18] method replaces point sampling in dilated convolutions with region sampling; this mixed dilated convolution can better capture the neighboring information of audio and, combined with feature fusion, achieves the SED task. Zhao et al [19] utilize a CRNN as the detection network for SED systems and employ a differentiable soft median filter.…”

Section: Literature Reviewmentioning

confidence: 99%

Sound Event Detection with Perturbed Residual Recurrent Neural Network

Yuan,

Yang,

Guo

2023

Electronics

View full text Add to dashboard Cite

Sound event detection (SED) is of great practical and research significance owing to its wide range of applications. However, due to the heavy reliance on dataset size for task performance, there is often a severe lack of data in real-world scenarios. In this study, an improved mean teacher model is utilized to carry out semi-supervised SED, and a perturbed residual recurrent neural network (P-RRNN) is proposed as the SED network. The residual structure is employed to alleviate the problem of network degradation, and pre-training the improved model on the ImageNet dataset enables it to learn information that is beneficial for event detection, thus improving the performance of SED. In the post-processing stage, a customized median filter group with a specific window length is designed to effectively smooth each type of event and minimize the impact of background noise on detection accuracy. Experimental results conducted on the publicly available Detection and Classification of Acoustic Scenes and Events 2019 Task 4 dataset demonstrate that the P-RRNN used for SED in this study can effectively enhance the detection capability of the model. The detection system achieves a Macro Event-based F1 score of 38.8% on the validation set and 40.5% on the evaluation set, indicating that the proposed method can adapt to complex and dynamic SED scenarios.

show abstract

Section: Literature Reviewmentioning

confidence: 99%