Emotion Classification Using Segmentation of Vowel-Like and Non-Vowel-Like Regions

Deb, Suman; Dandapat, Samarendra

doi:10.1109/taffc.2017.2730187

Cited by 35 publications

(9 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In [8], the authors proposed a new speech feature combined with an SVM classifier and evaluated it using the EMODB and CAISA databases. In [39], the authors proposed feature extraction in both vowel and non-vowel regions with extreme learning machine (ELM), which they evaluated with the EMODB and IEMOCAP databases. In [40], the authors proposed a new speech feature combined with an acoustic mask with a likelihood classifier, and they evaluated it using the EMODB database.…”

Section: Confusion Matrix In Three Databasesmentioning

confidence: 99%

End-to-End Speech Emotion Recognition With Gender Information

Sun

2020

IEEE Access

View full text Add to dashboard Cite

Many works have focused on speech emotion recognition algorithms. However, most rely on the proper selection of speech acoustic features. In this paper, we propose a novel emotion recognition algorithm that does not rely on any speech acoustic features and combines speaker gender information. We aim to benefit from the rich information from speech raw data, without any artificial intervention. In general, speech emotion recognition systems require manual selection of appropriate traditional acoustic features as classifier input for emotion recognition. Utilizing deep learning algorithms, and the network automatically select important information from raw speech signal for the classification layer to accomplish emotion recognition. It can prevent the omission of emotion information that cannot be direct mathematically modeled as a speech acoustic characteristic. We also add speaker gender information to the proposed algorithm to further improve recognition accuracy. The proposed algorithm combines a Residual Convolutional Neural Network (R-CNN) and a gender information block. The raw speech data is sent to these two blocks simultaneously. The R-CNN network obtains the necessary emotional information from the speech data and classifies the emotional category. The proposed algorithm is evaluated on three public databases with different language systems. Experimental results show that the proposed algorithm has 5.6%, 7.3%, and 1.5%, respectively accuracy improvements in Mandarin, English, and German compared with existing highest-accuracy algorithms. In order to verify the generalization of the proposed algorithm, we use FAU and eNTERFACE databases, in these two independent databases, the proposed algorithm can also achieve 85.8% and 71.1% accuracy, respectively.

show abstract

Section: Confusion Matrix In Three Databasesmentioning

confidence: 99%

End-to-End Speech Emotion Recognition With Gender Information

Sun

2020

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Two recent works using the audio modality can be found in [53] and [54]. Deb & Dandapat in 2017 proposed a method for speech emotion classification using vowel-like regions (VLRs) and non-vowel-like regions (non-VLRs).…”

Section: Audio Modalitiesmentioning

confidence: 99%

Multimodal Emotion and Sentiment Modeling From Unstructured Big Data: Challenges, Architecture, & Techniques

Seng

Ang

2019

IEEE Access

View full text Add to dashboard Cite

The exponential growth of multimodal content in today's competitive business environment leads to a huge volume of unstructured data. Unstructured big data has no particular format or structure and can be in any form, such as text, audio, images, and video. In this paper, we address the challenges of emotion and sentiment modeling due to unstructured big data with different modalities. We first include an up-to-date review on emotion and sentiment modeling including the state-of-the-art techniques. We then propose a new architecture of multimodal emotion and sentiment modeling for big data. The proposed architecture consists of five essential modules: data collection module, multimodal data aggregation module, multimodal data feature extraction module, fusion and decision module, and application module. Novel feature extraction techniques called the divide-and-conquer principal component analysis (Div-ConPCA) and the divide-andconquer linear discriminant analysis (Div-ConLDA) are proposed for the multimodal data feature extraction module in the architecture. The experiments on a multicore machine architecture are performed to validate the performance of the proposed techniques.INDEX TERMS Big data, affective analytics, emotion recognition, sentiment modeling, unstructured data.

show abstract

“…The MFCC is a widely used spectral feature to speech emotional recognition, 26 which composes of MFCC, delta MFCC, and delta‐delta MFCC, the total of 39 coefficients.…”

Section: Experiments and Evaluationmentioning

confidence: 99%

Speech emotion recognition using emotion perception spectral feature

Jiang

Tan

Yang

et al. 2019

Concurrency and Computation

View full text Add to dashboard Cite

Summary Speech emotion recognition is an important technique for human‐computer interface applications. Due to contain rich information of emotion, the spectral feature is widely used for emotion recognition. However, the recognition performance is limited because of imprecise extracted rule and uncertain size of resolution of spectral feature. To address this issue, motivated by speech coding, we introduced psychoacoustics model, provided a perception spectral subband partition method for obtaining more precise frequency resolution. Moreover, we also provided a new spectral feature on the divided subband frequency signals. The proposed feature includes emotional perception entropy, spectral inclination, and spectral flatness. Then, a Support Vector Machine classifier is used to recognize emotion categories. The experiment results show that the proposed spectral feature is superior to the traditional MFCC feature, and also better than the state‐of‐the‐art Fourier feature and multi‐resolution amplitude feature.

show abstract

Emotion Classification Using Segmentation of Vowel-Like and Non-Vowel-Like Regions

Cited by 35 publications

References 44 publications

End-to-End Speech Emotion Recognition With Gender Information

End-to-End Speech Emotion Recognition With Gender Information

Multimodal Emotion and Sentiment Modeling From Unstructured Big Data: Challenges, Architecture, & Techniques

Speech emotion recognition using emotion perception spectral feature

Contact Info

Product

Resources

About