Multiscale Amplitude Feature and Significance of Enhanced Vocal Tract Information for Emotion Classification

Deb, Suman; Dandapat, Samarendra

doi:10.1109/tcyb.2017.2787717

Cited by 46 publications

(18 citation statements)

References 52 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Current methods of emotion recognition mainly involve facial expression recognition [3][4][5][6], speech emotion recognition [7][8][9], gesture expression recognition [10], text recognition [11], physiological pattern recognition, and multimodal emotion recognition [12][13][14][15]. In practical applications, the non-contact method of extracting physiological parameters for face imaging has attracted special attention.…”

Section: Literature Reviewmentioning

confidence: 99%

Spatial–Spectral–Temporal Framework for Emotion Recognition

Hong

2020

IEEE Access

View full text Add to dashboard Cite

An emotion recognition method based on multispectral imaging technology and tissue oxygen saturation (StO2) is proposed in this study. This method is called spatial-spectral-temporal adjustment convolutional neural network (SACNN). First, we use the algorithm to extract the StO2 content of an emotionally sensitive nose area through real-time multispectral imaging technology. Compared with facial expression data, StO2 data are more objective and cannot be controlled and changed artificially. Second, we construct a clustering algorithm based on the emotional state by extracting the spectral, StO2, and spatial features of the nose image to obtain accurate signals of emotionally sensitive areas. To utilize the correlation between spectral and spatial signals, we propose an adjustment-based CNN module, which reorganizes the relationship between all previous layers of the feature map, thereby making the relationship among layers close and highly quantitative. The features extracted through this method are consistent with spatial-spectral features. Third, we incorporate the extracted temporal feature signal into the long short-term memory module and finally complete the correlation between the spatial-spectral-temporal features. Experimental results show that the accuracy of the SACNN algorithm in emotional recognition reaches 90%, and the proposed method is more competitive than state-of-the-art approaches. To the best of our knowledge, this study is the first to use time-series StO2 signals for emotion recognition. INDEX TERMS Multispectral imaging, oxygen saturation, spatial-spectral-temporal adjustment convolutional neural network. I.

show abstract

Section: Literature Reviewmentioning

confidence: 99%

Spatial–Spectral–Temporal Framework for Emotion Recognition

Hong

2020

IEEE Access

View full text Add to dashboard Cite

show abstract

“…It is evident from the literature that the combination of speech features, i.e. feature fusion, increases the classification accuracy of the SER system [6,23,28] and hence became the most common practice in this field.…”

Section: Continuous Featuresmentioning

confidence: 99%

“…Mel-frequency cepstral coefficients (MFCCs) [11,21,22], linear prediction coefficients (LPCs) [23], relative spectral perceptual linear prediction (RASTA-PLP) [16], and variants of these features like modified MFCC (M-MFCC) [13], feature fusion of MFCC, and short-time energy features with velocity ( ∆) and acceleration ( ∆+∆ ) [23] are some of the well-known spectral features that are used for speech emotion recognition. Apart from these, log frequency power coefficients (LFPCs) [24], Fourier parameter features [25], time-frequency features with AMS-GMM mask [26], modulation spectral features [27], and amplitude-based features [28] are some of the variants of spectral features that are now used in SER analysis.…”

Section: Continuous Featuresmentioning

confidence: 99%

Speech emotion recognition using semi-NMF feature optimization

Bandela¹,

Kumar²

2019

Turk J Elec Eng & Comp Sci

View full text Add to dashboard Cite

In recent times, much research is progressing forward in the field of speech emotion recognition (SER). Many SER systems have been developed by combining different speech features to improve their performances. As a result, the complexity of the classifier increases to train this huge feature set. Additionally, some of the features could be irrelevant in emotion detection and this leads to a decrease in the emotion recognition accuracy. To overcome this drawback, feature optimization can be performed on the feature sets to obtain the most desirable emotional feature set before classifying the features. In this paper, semi-nonnegative matrix factorization (semi-NMF) with singular value decomposition (SVD) initialization is used to optimize the speech features. The speech features considered in this work are mel-frequency cepstral coefficients, linear prediction cepstral coefficients, and Teager energy operator-autocorrelation (TEO-AutoCorr).This work uses k-nearest neighborhood and support vector machine (SVM) for the classification of emotions with a 5-fold cross-validation scheme. The datasets considered for the performance analysis are EMO-DB and IEMOCAP. The performance of the proposed SER system using semi-NMF is validated in terms of classification accuracy. The results emphasize that the accuracy of the proposed SER system is improved remarkably upon using the semi-NMF algorithm for optimizing the feature sets compared to the baseline SER system without optimization.

show abstract

“…Multiscale amplitude feature (abbreviate as Mul‐Amp) is a latest provided multi‐resolution feature on time domain in 2018 14 . The multi‐resolution is achieved by wavelet package transformation and subband partition.…”

Section: Experiments and Evaluationmentioning

confidence: 99%

“…Deb and Dandapat extracted a subband amplitude feature by decomposed speech signal into multi‐scale frequency bands and Fourier transform. This feature has a good distinguishing performance on experiments 14 . However, this feature used a uniform partition method to frequency bands, cannot embody the requirement of non‐linear in psychoacoustic model.…”

Section: Introductionmentioning

confidence: 99%

Speech emotion recognition using emotion perception spectral feature

Jiang

Tan

Yang

et al. 2019

Concurrency and Computation

View full text Add to dashboard Cite

Summary Speech emotion recognition is an important technique for human‐computer interface applications. Due to contain rich information of emotion, the spectral feature is widely used for emotion recognition. However, the recognition performance is limited because of imprecise extracted rule and uncertain size of resolution of spectral feature. To address this issue, motivated by speech coding, we introduced psychoacoustics model, provided a perception spectral subband partition method for obtaining more precise frequency resolution. Moreover, we also provided a new spectral feature on the divided subband frequency signals. The proposed feature includes emotional perception entropy, spectral inclination, and spectral flatness. Then, a Support Vector Machine classifier is used to recognize emotion categories. The experiment results show that the proposed spectral feature is superior to the traditional MFCC feature, and also better than the state‐of‐the‐art Fourier feature and multi‐resolution amplitude feature.

show abstract

Multiscale Amplitude Feature and Significance of Enhanced Vocal Tract Information for Emotion Classification

Cited by 46 publications

References 52 publications

Spatial–Spectral–Temporal Framework for Emotion Recognition

Spatial–Spectral–Temporal Framework for Emotion Recognition

Speech emotion recognition using semi-NMF feature optimization

Speech emotion recognition using emotion perception spectral feature

Contact Info

Product

Resources

About