Semisupervised Autoencoders for Speech Emotion Recognition

Deng, Jun; Xu, Xinzhou; Zhang, Zixing; Frühholz, Sascha; Schuller, Björn

doi:10.1109/taslp.2017.2759338

Cited by 132 publications

(72 citation statements)

References 44 publications

(60 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similarly, Cummins et al [43] utilised CNNs pre-trained on large amounts of image data to extract robust feature representations for speech-based emotion recognition. More recently, neural-network-based semi-supervised learning has been introduced to leverage large-scale unlabelled data [13], [44].…”

Section: B Sparsity Of Collected Datamentioning

confidence: 99%

Adversarial Training in Affective Computing and Sentiment Analysis: Recent Advances and Perspectives [Review Article]

Han

Zhang

Schuller

2019

IEEE Comput. Intell. Mag.

Self Cite

View full text Add to dashboard Cite

Over the past few years, adversarial training has become an extremely active research topic and has been successfully applied to various Artificial Intelligence (AI) domains. As a potentially crucial technique for the development of the next generation of emotional AI systems, we herein provide a comprehensive overview of the application of adversarial training to affective computing and sentiment analysis. Various representative adversarial training algorithms are explained and discussed accordingly, aimed at tackling diverse challenges associated with emotional AI systems. Further, we highlight a range of potential future research directions. We expect that this overview will help facilitate the development of adversarial training for affective computing and sentiment analysis in both the academic and industrial communities.

show abstract

Section: B Sparsity Of Collected Datamentioning

confidence: 99%

Adversarial Training in Affective Computing and Sentiment Analysis: Recent Advances and Perspectives [Review Article]

Han

Zhang

Schuller

2019

IEEE Comput. Intell. Mag.

Self Cite

View full text Add to dashboard Cite

show abstract

“…In the attribute-learning phase, we consider three approaches for fi(•)s, namely, shallow-structure Multi-Layer Perceptron (MLP) networks [27], Support Vector Regression (SVR) [28], and Ridge Regression (RR) [15]. The MLP consists of a two-hidden-layer structure, with 12 selections of the hiddenlayer neurons as: (32,8), (32,16), (64, 16), . .…”

Section: Learning Approachesmentioning

confidence: 99%

“…Prominent directions concerning SER have focused on diverse topics such as data collection [3], data enrichment [4], deep learning [5], feature enhancement [6], and transfer learning [7]. Nevertheless, most current research on SER is, arguably, focused on the passive learning of emotional states, which need fully, or at least partially labelled training samples to learn reasonable models [8]. Furthermore, passive approaches are unsuitable for labelling samples which do not have an adequate amount of matched training data.…”

Section: Introductionmentioning

confidence: 99%

Autonomous Emotion Learning in Speech: A View of Zero-Shot Speech Emotion Recognition

Deng

Cummins

et al. 2019

Interspeech 2019

Self Cite

View full text Add to dashboard Cite

Conventionally, speech emotion recognition is achieved using passive learning approaches. Differing from such approaches, we herein propose and develop a dynamic method of autonomous emotion learning based on zero-shot learning. The proposed methodology employs emotional dimensions as the attributes in the zero-shot learning paradigm, resulting in two phases of learning, namely attribute learning and label learning. Attribute learning connects the paralinguistic features and attributes utilising speech with known emotional labels, while label learning aims at defining unseen emotions through the attributes. The experimental results achieved on the CINEMO corpus indicate that zero-shot learning is a useful technique for autonomous speech-based emotion learning, achieving accuracies considerably better than chance level and an attribute-based gold-standard setup. Furthermore, different emotion recognition tasks, emotional attributes, and employed approaches strongly influence system performance.

show abstract

“…In this way, the neural network becomes a "black-box" analysis technique for speech emotion feature extraction. Nevertheless, acoustic (or any other) analysis of the speech signal still remains relevant topic in speech emotion recognition [13,28,55].…”

Section: Introductionmentioning

confidence: 99%

Speech emotion classification using fractal dimension-based features

Tamulevičius

Karbauskaitė

Dzemyda

2019

NAMC

View full text Add to dashboard Cite

During the last 10–20 years, a great deal of new ideas have been proposed to improve the accuracy of speech emotion recognition: e.g., effective feature sets, complex classification schemes, and multi-modal data acquisition. Nevertheless, speech emotion recognition is still the task in limited success. Considering the nonlinear and fluctuating nature of the emotional speech, in this paper, we present fractal dimension-based features for speech emotion classification. We employed Katz, Castiglioni, Higuchi, and Hurst exponent-based features and their statistical functionals to establish the 224-dimensional full feature set. The dimension was downsized by applying the Sequential Forward Selection technique. The results of experimental study show a clear superiority of fractal dimension-based feature sets against the acoustic ones. The average accuracy of 96.5% was obtained using the reduced feature sets. The feature selection enabled us to obtain the 4-dimensional and 8-dimensional sets for Lithuanian and German emotions, respectively.

show abstract

Semisupervised Autoencoders for Speech Emotion Recognition

Cited by 132 publications

References 44 publications

Adversarial Training in Affective Computing and Sentiment Analysis: Recent Advances and Perspectives [Review Article]

Adversarial Training in Affective Computing and Sentiment Analysis: Recent Advances and Perspectives [Review Article]

Autonomous Emotion Learning in Speech: A View of Zero-Shot Speech Emotion Recognition

Speech emotion classification using fractal dimension-based features

Contact Info

Product

Resources

About