A Recurrent Variational Autoencoder for Speech Enhancement

Leglaive, Simon; Alameda-Pineda, Xavier; Girin, Laurent; Horaud, Radu

doi:10.1109/icassp40776.2020.9053164

Cited by 71 publications

(79 citation statements)

References 31 publications

(75 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…From table 6, in comparison with the other methods including wavelet [12], time-frequency filter bank [13], ICA [17], and VAE [25], our model gets a high result. We achieve 12.99dB in SDR measure, which is the highest score, and 15.02dB in SIR.…”

Section: ) a Test Case With Whole Timit Datasetmentioning

confidence: 89%

“…The result by only BPF is much lower than only VAE in terms of both SDR, SIR, and PESQ. If these two components are combined to form the full model, it achieves a higher PESQ than the result at [25].…”

Section: ) a Test Case With Whole Timit Datasetmentioning

confidence: 99%

“…Because the idea of this approach concentrates on how to learn the distribution of the speech signal [22], [23], the result, different from the two approaches below, does not depend on the background and noises [24], [25]. This means that this solution can be built one time and then used many times with many different interfering signals.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Speech Source Separation Using Variational Autoencoder and Bandpass Filter

2020

View full text Add to dashboard Cite

Speech source separation is essential for speech-related applications because this process enhances the input speech signal for the main processing model. Most of the current approaches for this task focus on separating the speech of commonly high-frequency noises or a particular background sound. They cannot clear the signals which intersect with the human speech in its frequency range. To deal with this problem, we propose a hybrid approach combining a variational autoencoder (VAE) and a bandpass filter (BPF). This method can extract and enhance the speech signal in the mixture of many elements such as speech signal, the high-frequency noises, and many kinds of different background sounds which interfere with the speech sound. Experimental results showed that our model can extract effectively the speech signal with 15.02 dB in Signal to Interference Ratio (SIR) and 12.99 dB in Signal to Distortion Ratio (SDR). On the other hand, we can adjust the passband to identify the range of frequency at the output signal to apply for a particular application like gender classification.

show abstract

Section: ) a Test Case With Whole Timit Datasetmentioning

confidence: 89%

Section: ) a Test Case With Whole Timit Datasetmentioning

confidence: 99%

See 1 more Smart Citation

Speech Source Separation Using Variational Autoencoder and Bandpass Filter

2020

View full text Add to dashboard Cite

show abstract

“…Because of their success, VAE is extended for speech processing . For example, in [25] VAE is used for modeling the magnitude spectrogram (STFT) for speech enhancement. For instance, the authors in [26] propose a new sequence to sequence model, an RNN semantic variational autoencoder (RNN-SVAE).…”

Section: Introductionmentioning

confidence: 99%

Human Laughter Generation using Hybrid Generative Models

2021

KSII TIIS

View full text Add to dashboard Cite

show abstract

“…Often, such methods are implemented prior to the feature extraction as an enhancement of the speech signal [4][5][6]. Methods working at the feature level may or may not be a part of the ASR system, as in the case of a deep denoising autoencoder [7][8][9]. Another approach is the joint training of the feature extractor and the acoustic model.…”

Section: Introductionmentioning

confidence: 99%

End-to-End Noisy Speech Recognition Using Fourier and Hilbert Spectrum Features

Vazhenina

Markov

2020

Electronics

View full text Add to dashboard Cite

Despite the progress of deep neural networks over the last decade, the state-of-the-art speech recognizers in noisy environment conditions are still far from reaching satisfactory performance. Methods to improve noise robustness usually include adding components to the recognition system that often need optimization. For this reason, data augmentation of the input features derived from the Short-Time Fourier Transform (STFT) has become a popular approach. However, for many speech processing tasks, there is an evidence that the combination of STFT-based and Hilbert–Huang transform (HHT)-based features improves the overall performance. The Hilbert spectrum can be obtained using adaptive mode decomposition (AMD) techniques, which are noise-robust and suitable for non-linear and non-stationary signal analysis. In this study, we developed a DeepSpeech2-based recognition system by adding a combination of STFT and HHT spectrum-based features. We propose several ways to combine those features at different levels of the neural network. All evaluations were performed using the WSJ and CHiME-4 databases. Experimental results show that combining STFT and HHT spectra leads to a 5–7% relative improvement in noisy speech recognition.

show abstract

A Recurrent Variational Autoencoder for Speech Enhancement

Cited by 71 publications

References 31 publications

Speech Source Separation Using Variational Autoencoder and Bandpass Filter

Speech Source Separation Using Variational Autoencoder and Bandpass Filter

Human Laughter Generation using Hybrid Generative Models

End-to-End Noisy Speech Recognition Using Fourier and Hilbert Spectrum Features

Contact Info

Product

Resources

About