2017
DOI: 10.1007/s11042-017-5539-3
|View full text |Cite
|
Sign up to set email alerts
|

Spectrogram based multi-task audio classification

Abstract: Audio classification is regarded as a great challenge in pattern recognition. Although audio classification tasks are always treated as independent tasks, tasks are essentially related to each other such as speakers' accent and speakers' identification. In this paper, we propose a Deep Neural Network (DNN)-based multi-task model that exploits such relationships and deals with multiple audio classification tasks simultaneously. We term our model as the gated Residual Networks (GResNets) model since it integrate… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
43
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 135 publications
(53 citation statements)
references
References 25 publications
(40 reference statements)
2
43
0
1
Order By: Relevance
“…Haytham et al [38] described a neural network and RNN-based technique for SER using spectrograms of frames to train DNN, which has a high computational coast and did not achieve high accuracy. Tripathi et al [43] described the deep learning models for SER using transcript and phoneme. To train the different models with different features to increase the accuracy up to 71%, but they used same architecture, which is used for computer vision-related tasks.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Haytham et al [38] described a neural network and RNN-based technique for SER using spectrograms of frames to train DNN, which has a high computational coast and did not achieve high accuracy. Tripathi et al [43] described the deep learning models for SER using transcript and phoneme. To train the different models with different features to increase the accuracy up to 71%, but they used same architecture, which is used for computer vision-related tasks.…”
Section: Discussionmentioning
confidence: 99%
“…It represents the significance and efficiency of the proposed DSCNN model on the RAVDESS dataset, which outperformed results in SER. Yuni et al [43] presented a spectrogram-based CNN model for multi-class audio classification on the combination of two models to achieve 64.48%, accuracy in multitask SER. Jalal et al [44] and Anjali et al [45] used the log spectrogram and spectral feature to recognize the emotion in speech data with 68% and 75%, accuracy, respectively.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…In Oramas [20], a hybrid approach was presented that combined diverse modalities (album cover images, reviews, and audio tracks) for multi-label music genre classification by applying deep learning techniques appropriate for each modality, an approach that outperformed the single-modality methods. Finally, it should be mentioned that many methods in machine learning are also proposed for the human voice classification task: emotion recognition [21], English accent classification, and gender classification [22], to name a few.…”
Section: Introductionmentioning
confidence: 99%
“…Over the last decade, research in sound classification and recognition has gained in popularity and rapidly broaden in its application from the more traditional focus on speech recognition [1] and music genre classification [2] to biometric identification [3], computer-aided heart sound detection [4], environmental audio scene and sound recognition [5,6], biodiversity assessment [7], human voice classification and emotion recognition [8], English accent classification and gender identification [9], to list a few of a widerange of application areas. As with research in pattern recognition generally, the features fed into classifiers were initially engineered, which in the case of sound applications meant extracting from raw audio traces such descriptors as the Statistical Spectrum Descriptor and Rhythm Histogram [10].…”
Section: Introductionmentioning
confidence: 99%