2021
DOI: 10.48550/arxiv.2109.05172
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Incorporating Real-world Noisy Speech in Neural-network-based Speech Enhancement Systems

Abstract: Supervised speech enhancement relies on parallel databases of degraded speech signals and their clean reference signals during training. This setting prohibits the use of real-world degraded speech data that may better represent the scenarios where such systems are used. In this paper, we explore methods that enable supervised speech enhancement systems to train on real-world degraded speech data. Specifically, we propose a semi-supervised approach for speech enhancement in which we first train a modified vect… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 19 publications
0
1
0
Order By: Relevance
“…Due to the data acquisition challenge mentioned above, it is desirable to leverage real conversational data without any supervision signals. For single-channel speech enhancement and separation tasks, various techniques to leverage unsupervised data have been investigated, including cycle-consistencyloss-based training [16,17], semi-supervised training [18,19], WavLM [20], and mixture invariant training [21]. However, no work has been done to leverage real conversational unsupervised data for the multi-channel speech separation modeling, where the model must learn the nonlinear correlation between the input channels while solving the permutation problem of the multiple output signals.…”
Section: Introductionmentioning
confidence: 99%
“…Due to the data acquisition challenge mentioned above, it is desirable to leverage real conversational data without any supervision signals. For single-channel speech enhancement and separation tasks, various techniques to leverage unsupervised data have been investigated, including cycle-consistencyloss-based training [16,17], semi-supervised training [18,19], WavLM [20], and mixture invariant training [21]. However, no work has been done to leverage real conversational unsupervised data for the multi-channel speech separation modeling, where the model must learn the nonlinear correlation between the input channels while solving the permutation problem of the multiple output signals.…”
Section: Introductionmentioning
confidence: 99%