Interspeech 2022 2022
DOI: 10.21437/interspeech.2022-10667
|View full text |Cite
|
Sign up to set email alerts
|

Generative Data Augmentation Guided by Triplet Loss for Speech Emotion Recognition

Abstract: Speech Emotion Recognition (SER) is crucial for humancomputer interaction but still remains a challenging problem because of two major obstacles: data scarcity and imbalance. Many datasets for SER are substantially imbalanced, where data utterances of one class (most often Neutral) are much more frequent than those of other classes. Furthermore, only a few data resources are available for many existing spoken languages. To address these problems, we exploit a GAN-based augmentation model guided by a triplet ne… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 38 publications
(62 reference statements)
0
2
0
Order By: Relevance
“…Generating spectrograms or raw waveforms provides more flexibility by allowing us to train models directly on the raw data. Chatziagapi et al [4] and Wang et al [5] proposed generating mel spectrograms using GANs to tackle data imbalance by augmenting the minority classes. Similarly Eskimez et al [16] used an improved version of GANs with higher generation quality to apply SER data augmentation using spectrograms.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Generating spectrograms or raw waveforms provides more flexibility by allowing us to train models directly on the raw data. Chatziagapi et al [4] and Wang et al [5] proposed generating mel spectrograms using GANs to tackle data imbalance by augmenting the minority classes. Similarly Eskimez et al [16] used an improved version of GANs with higher generation quality to apply SER data augmentation using spectrograms.…”
Section: Related Workmentioning
confidence: 99%
“…Synthetic data is artificially generated data, which can be used to replace or augment real data in training deep learning models. Such approach has multiple advantages in terms of data privacy and security [3], balancing skewed datasets [4,5], as well as overcoming the lack of large datasets, as the case with SER [6]. The quality and realism of synthetic data is critical for its effectiveness in deep learning applications.…”
Section: Introductionmentioning
confidence: 99%
“…Data augmentation is an efective method to solve this problem [16]. For example, generative adaptive networks (GANs) and its variants are often applied to generate new samples [17][18][19]. Alternatively, a larger data can be directly constructed from existing data with hand-crafted features [20].…”
Section: Introductionmentioning
confidence: 99%