Interspeech 2022 2022
DOI: 10.21437/interspeech.2022-10653
|View full text |Cite
|
Sign up to set email alerts
|

Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 36 publications
(16 citation statements)
references
References 0 publications
0
14
0
Order By: Relevance
“…Most similar to our proposed IR-SDE is the work of (Welker et al, 2022b;Richter et al, 2022), in which a mean-reverting SDE is applied to the speech processing tasks of speech enhancement and speech dereverberation. They utilize an SDE of the form (3) but with a different 𝜎 𝑡 and a constant 𝜃 , i.e.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Most similar to our proposed IR-SDE is the work of (Welker et al, 2022b;Richter et al, 2022), in which a mean-reverting SDE is applied to the speech processing tasks of speech enhancement and speech dereverberation. They utilize an SDE of the form (3) but with a different 𝜎 𝑡 and a constant 𝜃 , i.e.…”
Section: Related Workmentioning
confidence: 99%
“…As shown in Section 5.3, both of these are outperformed by our cosine 𝜃 scheduler. Moreover, while (Welker et al, 2022b;Richter et al, 2022;Welker et al, 2022a) all use the standard score matching objective, we introduce an alternative maximum likelihood-based loss function that stabilizes training and leads to improved restoration performance. Finally, we demonstrate the general applicability of our approach by applying it to six diverse image restoration tasks.…”
Section: Related Workmentioning
confidence: 99%
“…This research area can be categorized into two groups: conditional methods, which require specialized training for specific problems, and zero-shot methods, which leverage priors from unconditional diffusion models. Within the category of conditional models, several works target speech enhancement [33]- [36], image deblurring [8], and JPEG reconstruction [37], among others. It may be noted that these methods all require pairs of clean/degraded samples and a well-thought training data pipeline.…”
Section: B Diffusion Models For Blind Inverse Problemsmentioning
confidence: 99%
“…k and l denote the frequency bin and time frame indexes. Note that the existing diffusion-based models can be extended to complex-valued data, such as STFT coefficients [24,25].…”
Section: Diffusion-based Generative Model On Clean Speech Datamentioning
confidence: 99%