2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP) 2020
DOI: 10.1109/mmsp48831.2020.9287156
|View full text |Cite
|
Sign up to set email alerts
|

An Evolutionary-based Generative Approach for Audio Data Augmentation

Abstract: In this paper, we introduce a novel framework to augment raw audio data for machine learning classification tasks. For the first part of our framework, we employ a generative adversarial network (GAN) to create new variants of the audio samples that are already existing in our source dataset for the classification task. In the second step, we then utilize an evolutionary algorithm to search the input domain space of the previously trained GAN, with respect to predefined characteristics of the generated audio. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
3

Relationship

3
7

Authors

Journals

citations
Cited by 21 publications
(15 citation statements)
references
References 19 publications
0
7
0
Order By: Relevance
“…For the DEEPSPECTRUM features, we extract a 2,560 dimensional feature set of deep data-representations using the DEEPSPECTRUM toolkit (Amiriparian et al, 2017). DEEPSPECTRUM has shown success for various audio-and speech-based tasks (Mertes et al, 2020), and extracts features from the audio data using pre-trained convolutional neural networks. For this study, we extract features based on the viridis colour map, and the deep features are extracted from the layer fc7 of AlexNet (Krizhevsky et al, 2012).…”
Section: Featuresmentioning
confidence: 99%
“…For the DEEPSPECTRUM features, we extract a 2,560 dimensional feature set of deep data-representations using the DEEPSPECTRUM toolkit (Amiriparian et al, 2017). DEEPSPECTRUM has shown success for various audio-and speech-based tasks (Mertes et al, 2020), and extracts features from the audio data using pre-trained convolutional neural networks. For this study, we extract features based on the viridis colour map, and the deep features are extracted from the layer fc7 of AlexNet (Krizhevsky et al, 2012).…”
Section: Featuresmentioning
confidence: 99%
“…LVE was applied to different domains, such as video games [30] or fingerprint-based biometric systems [4]. It was also deployed successfully for searching through the latent space of a WaveGAN for the purpose of augmenting datasets [21]. Further, LVE was used to give human users the ability to interactively evolve through a learned GAN space [3,31].…”
Section: Gans and Controllabilitymentioning
confidence: 99%
“…We utilise WAVEGAN to generate new audio data, as first proposed in [4]. We have chosen WAVEGAN , as it shows promise for a range of audio generation tasks, in the domain of emotional speech [21], and music [4], as well as being successfully adapted for the task of data augmentation [22]. Typically, in a GAN paradigm, a generator produces new samples and competes against the discriminator, attempting to classify the instances as fake or real.…”
Section: Audio Generationmentioning
confidence: 99%