Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1845
|View full text |Cite
|
Sign up to set email alerts
|

Reducing Interference with Phase Recovery in DNN-based Monaural Singing Voice Separation

Abstract: State-of-the-art methods for monaural singing voice separation consist in estimating the magnitude spectrum of the voice in the short-time Fourier transform (STFT) domain by means of deep neural networks (DNNs). The resulting magnitude estimate is then combined with the mixture's phase to retrieve the complex-valued STFT of the voice, which is further synthesized into a time-domain signal. However, when the sources overlap in time and frequency, the STFT phase of the voice differs from the mixture's phase, whi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
3

Relationship

2
4

Authors

Journals

citations
Cited by 12 publications
(12 citation statements)
references
References 22 publications
0
12
0
Order By: Relevance
“…As well, in this work, the importance of SS/enhancement of SD-based approaches when combined with the spectral phase has been pointed out. In fact, the phase information was proven to enhance speech quality when integrated into several SS processes (with NMF [15,16], with DNN [25,26,29], time-frequency masks [28]) but was never investigated in SD-based approaches. Hence, the proposed work is further support for phase importance for speech processing.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…As well, in this work, the importance of SS/enhancement of SD-based approaches when combined with the spectral phase has been pointed out. In fact, the phase information was proven to enhance speech quality when integrated into several SS processes (with NMF [15,16], with DNN [25,26,29], time-frequency masks [28]) but was never investigated in SD-based approaches. Hence, the proposed work is further support for phase importance for speech processing.…”
Section: Discussionmentioning
confidence: 99%
“…Hence, recent DNN-based approaches also exploit recent phase recovery algorithms that tackle this issue in order to further enhance the separation quality. To this end, phase-aware DNNs can exploit phase information in three main scenarios; the first one consists of two-stage approaches that are magnitude separation then phase enhancement [25].…”
Section: Related Work: Phase-aware Speech Processing For Scssmentioning
confidence: 99%
“…Applying a nonnegative mask to the mixture's STFT results in assigning its phase to each isolated source. Even though this practice is common and yields satisfactory results, it is well established [8] that when sources overlap in the TF domain, using the mixture's phase induces residual interference and artifacts in the estimates. With the advent of deep learning, magnitudes can nowadays be estimated with a high accuracy, which outlines the need for more advanced phase recovery algorithms [9].…”
Section: Introductionmentioning
confidence: 99%
“…With the advent of deep learning, magnitudes can nowadays be estimated with a high accuracy, which outlines the need for more advanced phase recovery algorithms [9]. Consequently, a significant research effort has been put on phase recovery in DNN-based source separation, whether phase recovery algorithms are applied as a post-processing [8] or integrated within end-to-end systems for time-domain separation [10,11,12].…”
Section: Introductionmentioning
confidence: 99%
“…The work of P. Magron was conducted while he was with Tampere University and supported by the Academy of Finland, project no. 290190. focused on phase recovery in DNN-based source separation, whether phase recovery algorithms are applied as a postprocessing [12], [13] or integrated within end-to-end systems for time-domain separation [14], [15], [16], [17], [18].…”
Section: Introductionmentioning
confidence: 99%