2020
DOI: 10.1109/access.2020.3032226
|View full text |Cite
|
Sign up to set email alerts
|

Environment Sound Classification Based on Visual Multi-Feature Fusion and GRU-AWS

Abstract: There are two major questions regarding Environmental Sound Classification (ESC). What is the best audio recognition framework, and what is the most robust audio feature? For investigating above problems, the Gated Recurrent Unit (GRU) network was used to analyze the effect of single features such as Mel Scale Spectrogram (Mel), Log-Mel Scale Spectrogram (LM), and Mel frequency cepstral coefficient (MFCC) as well as multi-feature about Mel-MFCC, LM-MFCC, and Mel-LM-MFCC (T-M) in this paper. The experiment resu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
10

Relationship

0
10

Authors

Journals

citations
Cited by 18 publications
(6 citation statements)
references
References 42 publications
0
6
0
Order By: Relevance
“…The performance of the model has seven evaluators: accuracy (14), sensitivity (15), specificity (16), precision (17), the f1score ( 18), cohen's kappa (19), and the matthews correlation coefficient (MCC) (20). The model was assessed using the evaluation index.…”
Section: ) Model Evaluationmentioning
confidence: 99%
“…The performance of the model has seven evaluators: accuracy (14), sensitivity (15), specificity (16), precision (17), the f1score ( 18), cohen's kappa (19), and the matthews correlation coefficient (MCC) (20). The model was assessed using the evaluation index.…”
Section: ) Model Evaluationmentioning
confidence: 99%
“…Our feature engineering process was derived from reference [ 31 ]. Fusing of multi-spectrogram features as one new feature has been proposed to improve sound recognition accuracy [ 31 ]. A total of three features were extracted.…”
Section: Methodsmentioning
confidence: 99%
“…In the task of fusing the enhanced front-end and identifying the back-end, the input features from the enhanced front-end or clean sound order are selected according to a certain probability distribution. At the initial stage of training, because the performance of the enhanced model is not improved, the features of the input back-end recognition may not be able to better represent the audio information, leading to difficulties in model convergence [ 23 ]. Using the feature of clean sequence can correct the model, reduce the divergence of the model, and speed up the convergence.…”
Section: Algorithm Designmentioning
confidence: 99%