2019
DOI: 10.1109/jstsp.2019.2912565
|View full text |Cite
|
Sign up to set email alerts
|

Integration of Neural Networks and Probabilistic Spatial Models for Acoustic Blind Source Separation

Abstract: Despite a lot of progress in speech separation, enhancement, and automatic speech recognition realistic meeting recognition is still fairly unsolved. Most research on speech separation either focuses on spectral cues to address single-channel recordings or spatial cues to separate multichannel recordings and exclusively either rely on neural networks or probabilistic graphical models. Integrating a spatial clustering approach and a deep learning approach using spectral cues in a single framework can significan… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
27
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
3
1

Relationship

2
8

Authors

Journals

citations
Cited by 34 publications
(27 citation statements)
references
References 157 publications
0
27
0
Order By: Relevance
“…For example, comparable separation performance to TasNet can be achieved if either the encoder or the decoder are fixed to be the STFT or ISTFT. This opens the way to employ common frequency domain beamforming techniques for source extraction, which have been shown to deliver superior results to maskbased source extraction in many studies [15,16,17,18]. Finally, the fixed STFT-based encoder allows for a human interpretability of the masks, while at the same time maintaining a separation close to the excellent performance of TasNet.…”
Section: Introductionmentioning
confidence: 97%
“…For example, comparable separation performance to TasNet can be achieved if either the encoder or the decoder are fixed to be the STFT or ISTFT. This opens the way to employ common frequency domain beamforming techniques for source extraction, which have been shown to deliver superior results to maskbased source extraction in many studies [15,16,17,18]. Finally, the fixed STFT-based encoder allows for a human interpretability of the masks, while at the same time maintaining a separation close to the excellent performance of TasNet.…”
Section: Introductionmentioning
confidence: 97%
“…for i = 1, 2 where λ tpp is the TPP. We note that one can also employ supervised methods such as deep neural network to predict these values [20].…”
Section: The Proposed Adaptive Non-linear Msnr (Anl-msnr) Algorithmmentioning
confidence: 99%
“…Recently, these models have been successfully combined with statistical spatial models [39], [40] to improve the performance of the estimator. These combinations have also shown promising results in blind source separation [46]. Therefore, our approach integrates the use of statistical signal processing with deep learning for the difficult situation where classical assumptions are no longer valid.…”
Section: A Priori Spp Estimation Based On Deep Neural Networkmentioning
confidence: 99%