An architecture for incremental information fusion of cross-modal representations

Baumgärtner, Christopher; Beuck, Niels; Menzel, Wolfgang

doi:10.1109/mfi.2012.6343045

Cited by 4 publications

(1 citation statement)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another challenge in speech emotion classification is the fusion of the multiple features. A number of previous researches [14,15,16,17,18,19,20] have been reported which focused on major fusion strategies. While most of the above mentioned fusion methods yielded good performance, they almost simply concatenated the multiple features into a single high-dimensional feature vector and fed it into a final classifier or a shallow fusion model which has difficulty in joining learning intrinsic correlations between different acoustic feature representations.…”

Section: Introductionmentioning

confidence: 99%

Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network

Jiang

Wang

Jin

et al. 2019

Sensors

View full text Add to dashboard Cite

Automatic speech emotion recognition is a challenging task due to the gap between acoustic features and human emotions, which rely strongly on the discriminative acoustic features extracted for a given recognition task. We propose a novel deep neural architecture to extract the informative feature representations from the heterogeneous acoustic feature groups which may contain redundant and unrelated information leading to low emotion recognition performance in this work. After obtaining the informative features, a fusion network is trained to jointly learn the discriminative acoustic feature representation and a Support Vector Machine (SVM) is used as the final classifier for recognition task. Experimental results on the IEMOCAP dataset demonstrate that the proposed architecture improved the recognition performance, achieving accuracy of 64% compared to existing state-of-the-art approaches.

show abstract

Section: Introductionmentioning

confidence: 99%

Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network

Jiang

Wang

Jin

et al. 2019

Sensors

View full text Add to dashboard Cite

show abstract

A Multi-modal Data-Set for Systematic Analyses of Linguistic Ambiguities in Situated Contexts

Alaçam

Staron

Menzel

2018

Information Management and Big Data

View full text Add to dashboard Cite

Semi-Natural and Spontaneous Speech Recognition Using Deep Neural Networks with Hybrid Features Unification

2021

View full text Add to dashboard Cite

Recently, identifying speech emotions in a spontaneous database has been a complex and demanding study area. This research presents an entirely new approach for recognizing semi-natural and spontaneous speech emotions with multiple feature fusion and deep neural networks (DNN). A proposed framework extracts the most discriminative features from hybrid acoustic feature sets. However, these feature sets may contain duplicate and irrelevant information, leading to inadequate emotional identification. Therefore, an support vector machine (SVM) algorithm is utilized to identify the most discriminative audio feature map after obtaining the relevant features learned by the fusion approach. We investigated our approach utilizing the eNTERFACE05 and BAUM-1s benchmark databases and observed a significant identification accuracy of 76% for a speaker-independent experiment with SVM and 59% accuracy with, respectively. Furthermore, experiments on the eNTERFACE05 and BAUM-1s dataset indicate that the suggested framework outperformed current state-of-the-art techniques on the semi-natural and spontaneous datasets.

show abstract

An architecture for incremental information fusion of cross-modal representations

Cited by 4 publications

References 4 publications

Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network

Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network

A Multi-modal Data-Set for Systematic Analyses of Linguistic Ambiguities in Situated Contexts

Semi-Natural and Spontaneous Speech Recognition Using Deep Neural Networks with Hybrid Features Unification

Contact Info

Product

Resources

About