ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9053164
|View full text |Cite
|
Sign up to set email alerts
|

A Recurrent Variational Autoencoder for Speech Enhancement

Abstract: This paper presents a generative approach to speech enhancement based on a recurrent variational autoencoder (RVAE). The deep generative speech model is trained using clean speech signals only, and it is combined with a nonnegative matrix factorization noise model for speech enhancement. We propose a variational expectationmaximization algorithm where the encoder of the RVAE is finetuned at test time, to approximate the distribution of the latent variables given the noisy speech observations. Compared with pre… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
74
0
2

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 71 publications
(79 citation statements)
references
References 31 publications
(75 reference statements)
0
74
0
2
Order By: Relevance
“…From table 6, in comparison with the other methods including wavelet [12], time-frequency filter bank [13], ICA [17], and VAE [25], our model gets a high result. We achieve 12.99dB in SDR measure, which is the highest score, and 15.02dB in SIR.…”
Section: ) a Test Case With Whole Timit Datasetmentioning
confidence: 89%
See 2 more Smart Citations
“…From table 6, in comparison with the other methods including wavelet [12], time-frequency filter bank [13], ICA [17], and VAE [25], our model gets a high result. We achieve 12.99dB in SDR measure, which is the highest score, and 15.02dB in SIR.…”
Section: ) a Test Case With Whole Timit Datasetmentioning
confidence: 89%
“…The result by only BPF is much lower than only VAE in terms of both SDR, SIR, and PESQ. If these two components are combined to form the full model, it achieves a higher PESQ than the result at [25].…”
Section: ) a Test Case With Whole Timit Datasetmentioning
confidence: 99%
See 1 more Smart Citation
“…Because of their success, VAE is extended for speech processing . For example, in [25] VAE is used for modeling the magnitude spectrogram (STFT) for speech enhancement. For instance, the authors in [26] propose a new sequence to sequence model, an RNN semantic variational autoencoder (RNN-SVAE).…”
Section: Introductionmentioning
confidence: 99%
“…Often, such methods are implemented prior to the feature extraction as an enhancement of the speech signal [4][5][6]. Methods working at the feature level may or may not be a part of the ASR system, as in the case of a deep denoising autoencoder [7][8][9]. Another approach is the joint training of the feature extractor and the acoustic model.…”
Section: Introductionmentioning
confidence: 99%