iMetricGAN: Intelligibility Enhancement for Speech-in-Noise Using Generative Adversarial Network-Based Metric Learning

Li, Haoyu; Fu, Szu-Wei; Yu, Tao; Yamagishi, Junichi

doi:10.21437/interspeech.2020-1016

Cited by 15 publications

(13 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moreover, the signal examplê y, which is pre-enhanced using other reference algorithms (e.g., SSDRC [7] and OptSII [5]), is also fed into D int in the training. As demonstrated in our earlier study [28], learning such additional examples can stabilize the training process and improve performance. Given all the above notations, the loss function of D int is represented as follows:…”

Section: B System Overviewmentioning

confidence: 66%

“…We comprehensively evaluate the system's performance under different conditions with unseen noises and reverberations. Our experiments show that the improved system significantly increases the intelligibility and quality of speech over our original system [28] with far less parameters. Moreover, it also outperforms the state-of-the-art SSDRC baseline in both objective and subjective evaluations.…”

Section: Introductionmentioning

confidence: 80%

“…Thus, the number of nodes are accordingly set to 2. Similar to our previous study [28], we apply spectral normalization with 1-Lipschitz continuity [43] to all the layers used in D qua to stabilize the training process.…”

Section: Network Architecturesmentioning

confidence: 99%

“…Inspired by progresses in black-box function optimization [26], [27], we previously proposed a generative adversarial network (GAN)-based system [28] for near-end intelligibility enhancement. The system was composed of a generator that enhances the intelligibility of input speech and a discriminator that acts as a learned surrogate of evaluation metrics to guide the training scheme of the generator.…”

Section: Introductionmentioning

confidence: 99%

“…In this paper, we propose a causal and light-weight system as an extension to our earlier system [28]. We substitute the original bidirectional long-short term memory (BLSTM) with causal convolution.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Multi-Metric Optimization Using Generative Adversarial Networks for Near-End Speech Intelligibility Enhancement

Li¹,

Yamagishi²

2021

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

The intelligibility of speech severely degrades in the presence of environmental noise and reverberation. In this paper, we propose a novel deep learning based system for modifying the speech signal to increase its intelligibility under the equal-power constraint, i.e., signal power before and after modification must be the same. To achieve this, we use generative adversarial networks (GANs) to obtain time-frequency dependent amplification factors, which are then applied to the input raw speech to reallocate the speech energy. Instead of optimizing only a single, simple metric, we train a deep neural network (DNN) model to simultaneously optimize multiple advanced speech metrics, including both intelligibility-and quality-related ones, which results in notable improvements in performance and robustness. Our system can not only work in non-realtime mode for offline audio playback but also support practical real-time speech applications. Experimental results using both objective measurements and subjective listening tests indicate that the proposed system significantly outperforms state-ofthe-art baseline systems under various noisy and reverberant listening conditions.

show abstract

Section: B System Overviewmentioning

confidence: 66%

Section: Introductionmentioning

confidence: 80%

Section: Network Architecturesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations