Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-2561
|View full text |Cite
|
Sign up to set email alerts
|

Data Augmentation Using GANs for Speech Emotion Recognition

Abstract: In this work, we address the problem of data imbalance for the task of Speech Emotion Recognition (SER). We investigate conditioned data augmentation using Generative Adversarial Networks (GANs), in order to generate samples for underrepresented emotions. We adapt and improve a conditional GAN architecture to generate synthetic spectrograms for the minority class. For comparison purposes, we implement a series of signal-based data augmentation methods. The proposed GANbased approach is evaluated on two dataset… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
59
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 106 publications
(70 citation statements)
references
References 24 publications
0
59
0
Order By: Relevance
“…A black dot (•) in a cell means the corresponding database was used in the research mentioned at the bottom of the column. Year 2005 2010 2011 2013 2014 2016 2017 2018 2019 2020 Research HMM, SVM [6] SVM [17] GerDA, RBM [22] LSTM, BLSTM [28] CRF, CRBM [24] SVM, PCA, LPP, TSL [90] DNN, ANN, ELM [23] DCNN, LSTM [29] CNN [21] DCNN [26] LSTM, MTL [33] ANN, PSOF [19] DCNN, DTPM, TSL [25] LSTM, VAE [31] GAN [86] GAN, SVM [88] LSTM, ATTN [94] DCNN, LSTM [30] CNN, VAE, DAE, AAE, AVB [32] DCNN, GAN [89] LDA, TSL, TLSL [91] CNN, BLSTM, ATTN, MTL [95] LSTM, ATTN [83] DNN, Generative [76] DCNN [79] Additionally, Figure 2a shows a comparison between accuracies reported in deep learning methods based on EMO-DB versus IEMOCAP, which we can see there is a clear separation between the accuracies published. Again, one reason could be the fact that EMO-DB has one degree of magnitude fewer number of samples than IEMOCAP, and using it with deep learning methods makes it more prone to overfitting.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…A black dot (•) in a cell means the corresponding database was used in the research mentioned at the bottom of the column. Year 2005 2010 2011 2013 2014 2016 2017 2018 2019 2020 Research HMM, SVM [6] SVM [17] GerDA, RBM [22] LSTM, BLSTM [28] CRF, CRBM [24] SVM, PCA, LPP, TSL [90] DNN, ANN, ELM [23] DCNN, LSTM [29] CNN [21] DCNN [26] LSTM, MTL [33] ANN, PSOF [19] DCNN, DTPM, TSL [25] LSTM, VAE [31] GAN [86] GAN, SVM [88] LSTM, ATTN [94] DCNN, LSTM [30] CNN, VAE, DAE, AAE, AVB [32] DCNN, GAN [89] LDA, TSL, TLSL [91] CNN, BLSTM, ATTN, MTL [95] LSTM, ATTN [83] DNN, Generative [76] DCNN [79] Additionally, Figure 2a shows a comparison between accuracies reported in deep learning methods based on EMO-DB versus IEMOCAP, which we can see there is a clear separation between the accuracies published. Again, one reason could be the fact that EMO-DB has one degree of magnitude fewer number of samples than IEMOCAP, and using it with deep learning methods makes it more prone to overfitting.…”
Section: Discussionmentioning
confidence: 99%
“…Later and in 2019, Chatziagapi et al [ 89 ] utilized GAN as a conditioned data augmentation tool to overcome the SER systems’ data imbalance problem by generating synthetic spectrograms for the minority classes. During the experiment, GAN, fully convolutional architecture, approach beat Signal-based augmentation methods such as CP, CA, etc.…”
Section: Emotion Recognition Methodsmentioning
confidence: 99%
“…Therefore they used adversarial auto-encoders to synthetically generate samples to classify emotion. Chatziagapi et al [23] investigated conditioned data augmentation using GANs to address the problem of data imbalance for the task of Speech Emotion Recognition (SER). The authors adapted the conditional GAN architecture to generate synthetic spectrograms for the minority class.…”
Section: B Data Augmentation Based On Generative Adversarial Networkmentioning
confidence: 99%
“…Although GANs have been used in speech applications such as voice conversion [17] speech enhancement [18], [19], text to speech synthesis [20], only recently they have been adopted for data augmentation for audio and speech signals e.g. for sound classification [21], [22] and speech emotion recognition [23]- [25]. To the best of our knoweldge this work is the first one proposing GANs-based data augmentation for depression severity estimation from speech.…”
Section: Introductionmentioning
confidence: 99%
“…Another study with more focus on the improvement of SER is done by Chatziagapi et al [142]. They adopt a CGAN called Balancing GAN (BAGAN) [165] and improve it to generate synthetic spectrograms for the minority or under-represented emotion classes.…”
Section: Purpose or Characteristicmentioning
confidence: 99%