ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9053581
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Conditioning and Data Augmentation Using Generative Noise Model for Speech Emotion Recognition in Noisy Conditions

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
14
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 28 publications
(20 citation statements)
references
References 14 publications
0
14
0
Order By: Relevance
“…The linear projection layer predicts the emotion class possibility from the utterance-level emotional features. We perform data augmentation by adding white Gaussian noise to improve the robustness of SER ( [122], [123], [124], [125]).…”
Section: Speech Emotion Recognizermentioning
confidence: 99%
“…The linear projection layer predicts the emotion class possibility from the utterance-level emotional features. We perform data augmentation by adding white Gaussian noise to improve the robustness of SER ( [122], [123], [124], [125]).…”
Section: Speech Emotion Recognizermentioning
confidence: 99%
“…A black dot (•) in a cell means the corresponding database was used in the research mentioned at the bottom of the column. Year 2005 2010 2011 2013 2014 2016 2017 2018 2019 2020 Research HMM, SVM [6] SVM [17] GerDA, RBM [22] LSTM, BLSTM [28] CRF, CRBM [24] SVM, PCA, LPP, TSL [90] DNN, ANN, ELM [23] DCNN, LSTM [29] CNN [21] DCNN [26] LSTM, MTL [33] ANN, PSOF [19] DCNN, DTPM, TSL [25] LSTM, VAE [31] GAN [86] GAN, SVM [88] LSTM, ATTN [94] DCNN, LSTM [30] CNN, VAE, DAE, AAE, AVB [32] DCNN, GAN [89] LDA, TSL, TLSL [91] CNN, BLSTM, ATTN, MTL [95] LSTM, ATTN [83] DNN, Generative [76] DCNN [79] Additionally, Figure 2a shows a comparison between accuracies reported in deep learning methods based on EMO-DB versus IEMOCAP, which we can see there is a clear separation between the accuracies published. Again, one reason could be the fact that EMO-DB has one degree of magnitude fewer number of samples than IEMOCAP, and using it with deep learning methods makes it more prone to overfitting.…”
Section: Discussionmentioning
confidence: 99%
“…Lately, Tiwari et al [ 76 ] address the noise robustness of SER in the presence of additive noise by employing an utterance level parametric generative noise model. Their deep neural network framework is useful for defeating unseen noise since the generated noise can cover the entire noise space in the Mel filter bank energy domain.…”
Section: Emotion Recognition Methodsmentioning
confidence: 99%
“…The linear projection layer predicts the emotion class possibility from the utterance-level emotional features. We perform data augmentation by adding white Gaussian noise to improve the robustness of SER ( [120], [121], [122], [123]).…”
Section: Speech Emotion Recognizermentioning
confidence: 99%