End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction

Wang, Zhong-Qiu; Roux, Jonathan Le; Wang, DeLiang; Hershey, John R.

doi:10.21437/interspeech.2018-1629

Cited by 117 publications

(118 citation statements)

References 38 publications

(67 reference statements)

Supporting

Mentioning

116

Contrasting

Order By: Relevance

“…A major difference from [31], [33] is that we do not perform power or logarithmic compression on the magnitude spectra. This way, the DNN is always trained to estimate an STFT spectrogram that has consistent phase and magnitude structure, and hence would likely produce a good consistent STFT spectrogram at run time [34], [35].…”

Section: Siso1-bf-siso2 Systemmentioning

confidence: 99%

Multi-Microphone Complex Spectral Mapping for Speech Dereverberation

Wang

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

This study proposes a multi-microphone complex spectral mapping approach for speech dereverberation on a fixed array geometry. In the proposed approach, a deep neural network (DNN) is trained to predict the real and imaginary (RI) components of direct sound from the stacked reverberant (and noisy) RI components of multiple microphones. We also investigate the integration of multi-microphone complex spectral mapping with beamforming and post-filtering. Experimental results on multi-channel speech dereverberation demonstrate the effectiveness of the proposed approach.

show abstract

Section: Siso1-bf-siso2 Systemmentioning

confidence: 99%

Multi-Microphone Complex Spectral Mapping for Speech Dereverberation

Wang

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…Motivated by the recent advance in deep learning, several DNNbased phase reconstruction methods have been presented [18][19][20][21][22][23]. However, phase reconstruction from a given amplitude spectrogram is not an easy task for DNNs due to the following two problems: the wrapping effect and sensitivity to a shift of a waveform.…”

Section: Phase Reconstruction Via Dnnmentioning

confidence: 99%

Phase Reconstruction Based On Recurrent Phase Unwrapping With Deep Neural Networks

Masuyama

Yatabe

Oikawa

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Phase reconstruction, which estimates phase from a given amplitude spectrogram, is an active research field in acoustical signal processing with many applications including audio synthesis. To take advantage of rich knowledge from data, several studies presented deep neural network (DNN)-based phase reconstruction methods. However, the training of a DNN for phase reconstruction is not an easy task because phase is sensitive to the shift of a waveform. To overcome this problem, we propose a DNN-based two-stage phase reconstruction method. In the proposed method, DNNs estimate phase derivatives instead of phase itself, which allows us to avoid the sensitivity problem. Then, phase is recursively estimated based on the estimated derivatives, which is named recurrent phase unwrapping (RPU). The experimental results confirm that the proposed method outperformed the direct phase estimation by a DNN.

show abstract

“…Removing unwanted environmental background noise in speech signals is a common step in speech processing applications. Complex valued neural networks as well as phase estimation have been of great interest in speech enhancement lately, since the perceptual audio quality has been reported to be improved significantly [5,7,6,10].…”

Section: Related Workmentioning

confidence: 99%

“…While other work uses the whole signal in an off-line processing fashion as input for the noise reduction [9,10,11], our work requires real-time capabilities. Both high-res spectrograms and off-line processing are not feasible for hearing aid applications, where the overall latency is a very important property.…”

Section: Introductionmentioning

confidence: 99%

CLCNET: Deep Learning-Based Noise Reduction for Hearing aids using Complex Linear Coding

Schröter

Rosenkranz

Escalante-B.

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Noise reduction is an important part of modern hearing aids and is included in most commercially available devices. Deep learning-based state-of-the-art algorithms, however, either do not consider real-time and frequency resolution constrains or result in poor quality under very noisy conditions.To improve monaural speech enhancement in noisy environments, we propose CLCNet, a framework based on complex valued linear coding. First, we define complex linear coding (CLC) motivated by linear predictive coding (LPC) that is applied in the complex frequency domain. Second, we propose a framework that incorporates complex spectrogram input and coefficient output. Third, we define a parametric normalization for complex valued spectrograms that complies with low-latency and on-line processing.Our CLCNet was evaluated on a mixture of the EUROM database and a real-world noise dataset recorded with hearing aids and compared to traditional real-valued Wiener-Filter gains.

show abstract

End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction

Cited by 117 publications

References 38 publications

Multi-Microphone Complex Spectral Mapping for Speech Dereverberation

Multi-Microphone Complex Spectral Mapping for Speech Dereverberation

Phase Reconstruction Based On Recurrent Phase Unwrapping With Deep Neural Networks

CLCNET: Deep Learning-Based Noise Reduction for Hearing aids using Complex Linear Coding

Contact Info

Product

Resources

About