2018
DOI: 10.1109/taslp.2017.2759338
|View full text |Cite
|
Sign up to set email alerts
|

Semisupervised Autoencoders for Speech Emotion Recognition

Abstract: Despite the widespread use of supervised learning methods for speech emotion recognition, they are severely restricted due to the lack of sufficient amount of labelled speech data for the training. Considering the wide availability of unlabelled speech data, therefore, this paper proposes semisupervised autoencoders to improve speech emotion recognition. The aim is to reap the benefit from the combination of the labelled data and unlabelled data. The proposed model extends a popular unsupervised autoencoder by… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
70
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 132 publications
(72 citation statements)
references
References 44 publications
(60 reference statements)
0
70
0
Order By: Relevance
“…Similarly, Cummins et al [43] utilised CNNs pre-trained on large amounts of image data to extract robust feature representations for speech-based emotion recognition. More recently, neural-network-based semi-supervised learning has been introduced to leverage large-scale unlabelled data [13], [44].…”
Section: B Sparsity Of Collected Datamentioning
confidence: 99%
“…Similarly, Cummins et al [43] utilised CNNs pre-trained on large amounts of image data to extract robust feature representations for speech-based emotion recognition. More recently, neural-network-based semi-supervised learning has been introduced to leverage large-scale unlabelled data [13], [44].…”
Section: B Sparsity Of Collected Datamentioning
confidence: 99%
“…In the attribute-learning phase, we consider three approaches for fi(•)s, namely, shallow-structure Multi-Layer Perceptron (MLP) networks [27], Support Vector Regression (SVR) [28], and Ridge Regression (RR) [15]. The MLP consists of a two-hidden-layer structure, with 12 selections of the hiddenlayer neurons as: (32,8), (32,16), (64, 16), . .…”
Section: Learning Approachesmentioning
confidence: 99%
“…Prominent directions concerning SER have focused on diverse topics such as data collection [3], data enrichment [4], deep learning [5], feature enhancement [6], and transfer learning [7]. Nevertheless, most current research on SER is, arguably, focused on the passive learning of emotional states, which need fully, or at least partially labelled training samples to learn reasonable models [8]. Furthermore, passive approaches are unsuitable for labelling samples which do not have an adequate amount of matched training data.…”
Section: Introductionmentioning
confidence: 99%
“…In this way, the neural network becomes a "black-box" analysis technique for speech emotion feature extraction. Nevertheless, acoustic (or any other) analysis of the speech signal still remains relevant topic in speech emotion recognition [13,28,55].…”
Section: Introductionmentioning
confidence: 99%