Spectral Features Based on Local Hu Moments of Gabor Spectrograms for Speech Emotion Recognition

Tao, Huawei; Liang, Ruiyu; Zha, Cheng; Zhang, Xinran; Zhao, Li

doi:10.1587/transinf.2015edl8258

Cited by 7 publications

(8 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Effects of the experiment are conspicuous. On the ABC dataset, our method also outperforms [33], [37], [38], [40] in term of WA. And we report an UA of 57.59% on the ABC dataset, on which outperforms all the four compared works, i.e., 56.1% by [37], 55.5% by [33], 56.11% by [38], 52.26% by [40].…”

Section: Table 3 T-test On Test Resultsmentioning

confidence: 77%

“…On the ABC dataset, our method also outperforms [33], [37], [38], [40] in term of WA. And we report an UA of 57.59% on the ABC dataset, on which outperforms all the four compared works, i.e., 56.1% by [37], 55.5% by [33], 56.11% by [38], 52.26% by [40]. On the EMO-DB dataset, our method also clearly outperforms all the four compared works.…”

Section: Table 3 T-test On Test Resultsmentioning

confidence: 77%

See 1 more Smart Citation

Parallelized Convolutional Recurrent Neural Network With Spectral Features for Speech Emotion Recognition

Jiang

Tao

et al. 2019

IEEE Access

Self Cite

View full text Add to dashboard Cite

Speech is the most effective way for people to exchange complex information. Recognition of emotional information contained in speech is one of the important challenges in the field of artificial intelligence. To better acquire emotional features in speech signals, a parallelized convolutional recurrent neural network (PCRN) with spectral features is proposed for speech emotion recognition. First, frame-level features are extracted from each utterance and, a long short-term memory is employed to learn these features frame by frame. At the same time, the deltas and delta-deltas of the log Mel-spectrogram are calculated and reconstructed into three channels (static, delta, and delta-delta); these 3-D features are learned by a convolutional neural network (CNN). Then, the two learned high-level features are fused and batch normalized. Finally, a SoftMax classifier is used to classify emotions. Our PCRN model simultaneously processes two different types of features in parallel to better learn the subtle changes in emotion. The experimental results on four public datasets show the superiority of our proposed method, which is better than the previous works.INDEX TERMS Speech emotion recognition, parallelized convolutional recurrent neural network, convolutional neural network, long short-term memory.

show abstract

Section: Table 3 T-test On Test Resultsmentioning

confidence: 77%

Section: Table 3 T-test On Test Resultsmentioning

confidence: 77%

Parallelized Convolutional Recurrent Neural Network With Spectral Features for Speech Emotion Recognition

Jiang

Tao

et al. 2019

IEEE Access

Self Cite

View full text Add to dashboard Cite

show abstract

“…1 to 3 summarizes the improvement of the performances of HPCB in terms of UAR to the related peer methods on databases CASIA, EMODB, and SAVEE. Among them, literatures [7], [29][30], [32] used the research results of previous researchers as the baseline, while literature [21] was originally proposed in the research of automatic speech recognition. When researchers in literature [34][35][36][37] applied it to speech emotion recognition, the database used was also inconsistent with the database used in this study.…”

Section: The Performance Of Hpcb and Its Peer Methodsmentioning

confidence: 99%

A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition

Zhang¹,

Huang²,

Han³

2021

Preprint

View full text Add to dashboard Cite

Speech emotion recognition remains a heavy lifting in natural language processing. It has strict requirements to the effectiveness of feature extraction and that of acoustic model. With that in mind, a Heterogeneous Parallel Convolution Bi-LSTM model is proposed to address these challenges. It consists of two heterogeneous branches: the left one contains two dense layers and a Bi-LSTM layer, while the right one contains a dense layer, a convolution layer, and a Bi-LSTM layer. It can exploit the spatiotemporal information more effectively, and achieves 84.65%, 79.67%, and 56.50% unweighted average recall on the benchmark databases EMODB, CASIA, and SAVEE, respectively. Compared with the previous research results, the proposed model achieves better performance stably.

show abstract

“…Tables 1-3 summarize the performance improvements of HPCB in terms of UAR with respect to the related peer methods on the databases CASIA, EMODB, and SAVEE. Among them, the authors of [9,[41][42][43][44] used the research results of previous researchers as the baseline, while the study of [45] was originally proposed in the research of automatic speech recognition. When researchers in [46][47][48][49] applied it to speech emotion recognition, the database used was also inconsistent with the database used in this study.…”

Section: The Performance Of Hpcb and Its Peer Methodsmentioning

confidence: 99%

“…GA-BEL [41] 38.55 38.55 HuWSF [42] 43.50 43.50 RDBN [44] 48.50 48.50 PCRN [9] 58.25 58.25 Bi-LSTM [46] / 75.00 Bi-GRU [47] / 72.50 CNN [48] / 76.67 CLDNN [45] / 61.67 CapsNet [49] / 63.33 HPCB (Ours) / 79.67…”

Section: Model War Uarmentioning

confidence: 99%

A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition

2021

View full text Add to dashboard Cite

Speech emotion recognition is a substantial component of natural language processing (NLP). It has strict requirements for the effectiveness of feature extraction and that of the acoustic model. With that in mind, a Heterogeneous Parallel Convolution Bi-LSTM model is proposed to address the challenges. It consists of two heterogeneous branches: the left one contains two dense layers and a Bi-LSTM layer, while the right one contains a dense layer, a convolution layer, and a Bi-LSTM layer. It can exploit the spatiotemporal information more effectively, and achieves 84.65%, 79.67%, and 56.50% unweighted average recalls on the benchmark databases EMODB, CASIA, and SAVEE, respectively. Compared with the previous research results, the proposed model achieves better performance stably.

show abstract

Spectral Features Based on Local Hu Moments of Gabor Spectrograms for Speech Emotion Recognition

Cited by 7 publications

References 12 publications

Parallelized Convolutional Recurrent Neural Network With Spectral Features for Speech Emotion Recognition

Parallelized Convolutional Recurrent Neural Network With Spectral Features for Speech Emotion Recognition

A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition

A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition

Contact Info

Product

Resources

About