2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
DOI: 10.1109/icassp.2018.8462581
|View full text |Cite
|
Sign up to set email alerts
|

Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition

Abstract: We investigate the effectiveness of generative adversarial networks (GANs) for speech enhancement, in the context of improving noise robustness of automatic speech recognition (ASR) systems. Prior work [1] demonstrates that GANs can effectively suppress additive noise in raw waveform speech signals, improving perceptual quality metrics; however this technique was not justified in the context of ASR. In this work, we conduct a detailed study to measure the effectiveness of GANs in enhancing speech contaminated … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

5
124
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 186 publications
(142 citation statements)
references
References 30 publications
(43 reference statements)
5
124
0
Order By: Relevance
“…GAN-based training for SE [14] has received increased attention. GAN is employed to constrain the estimated signals close to the clean signals, which was shown to improve objective and subjective SE criterion, but it did not contribute to improvement in terms of ASR [26].…”
Section: Related Workmentioning
confidence: 99%
“…GAN-based training for SE [14] has received increased attention. GAN is employed to constrain the estimated signals close to the clean signals, which was shown to improve objective and subjective SE criterion, but it did not contribute to improvement in terms of ASR [26].…”
Section: Related Workmentioning
confidence: 99%
“…Moreover, there are many other studies on auditory data which work on audio spectrograms and consider them as 2D images. For instance, Donahue et al [86] as well as Michelsanti, Tan et al [87] employ GAN on audio spectrograms for speech enhancement. Fan et al [88] propose a GAN for separating the singing voice from background music.…”
Section: Related Workmentioning
confidence: 99%
“…Intentional noise has been added to machine translation data [9,10]. Alternate methods for collecting large scale audio data include Generative Adversarial Networks [11] and manual recording [12].…”
Section: Spoken Question Answering Datasetsmentioning
confidence: 99%