2022
DOI: 10.1007/s11760-022-02156-9
|View full text |Cite
|
Sign up to set email alerts
|

Speech emotion recognition using data augmentation method by cycle-generative adversarial networks

Abstract: One of the obstacles in developing speech emotion recognition (SER) systems is the data scarcity problem, i.e., the lack of labeled data for training these systems. Data augmentation is an effective method for increasing the amount of training data. In this paper, we propose a cycle generative adversarial network (Cycle-GAN) for data augmentation in the SER systems. For each of the five emotions considered, an adversarial network is designed to generate data that has a similar distribution to the main data in … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 19 publications
(12 citation statements)
references
References 31 publications
0
12
0
Order By: Relevance
“…They found that data augmentation is very helpful in building speech recognition systems. Other articles have also been published on improving emotion recognition rate using gender classification [15], extracting resistant speech features [16], and data augmentation using Cycle-Gans [17].…”
Section: The Related Studiesmentioning
confidence: 99%
“…They found that data augmentation is very helpful in building speech recognition systems. Other articles have also been published on improving emotion recognition rate using gender classification [15], extracting resistant speech features [16], and data augmentation using Cycle-Gans [17].…”
Section: The Related Studiesmentioning
confidence: 99%
“…However, non-basic emotions account for the majority of emotion manifestations in human-to-human communication. Furthermore, the majority of existing emotion recognition systems are unimodal: the system only processes speech data or face images [31]. In recent years, multimodal affect analysis has received a lot of attention, however, a very limited research has been done to exploit the audio-visual cues for emotion recognition tasks.…”
Section: Related Work a Unimodal Emotion Recognitionmentioning
confidence: 99%
“…Also, various data augmentation strategies have been successfully adopted for the same purpose, e.g. [36], [37]. On the other hand, the application of dimensionality reduction transformations to the model's input data is an established strategy for reducing resource demands while limiting the loss of useful information carried by the input data.…”
Section: Introductionmentioning
confidence: 99%