Urban sound event detection can automatically preload relevant information for the robot to ensure that it can be competent for various scene activity tasks. Aiming at the limitations of timbre similarity and scene recognition limited by audio collection devices, a fusion model based on self-attention mechanism is proposed in this paper. The model consists of scattering transform and self-attention model. Scattering transform computes modulation spectrum coefficients of multiple orders, through cascades of wavelet convolutions and modulus operators, and it is learnable compared with Mel-scale Frequency Cepstral Coefficients (MFCC), and can be used to better restore the semantic features of some sound scenes with similar timbres. Transformer has an outstanding effect on Natural Language Processing (NLP) with its selfattention mechanism. In this paper, the self-attention mechanism in its encoder is used in the model, mainly to make the feature granularity consistent to refine the features. in addition, Focal Loss function is adopted in the model to curb the problem of sample distribution imbalance. The datasets Google-Command and ESC-50 are used to supplement the scene categories of dataset UrbanSound8K. The model parameters of the learnable filters that performed well on the dataset UrbanSound8K were preserved to fine-tune the other two datasets with insufficient data volume and more target categories. The length of slice duration are further explored the on the model. Experimental results show that the model can achieve better performance in a large range of scene models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.