Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-216
|View full text |Cite
|
Sign up to set email alerts
|

Combining Mask Estimates for Single Channel Audio Source Separation Using Deep Neural Networks

Abstract: Deep neural networks (DNNs) are usually used for single channel source separation to predict either soft or binary time frequency masks. The masks are used to separate the sources from the mixed signal. Binary masks produce separated sources with more distortion and less interference than soft masks. In this paper, we propose to use another DNN to combine the estimates of binary and soft masks to achieve the advantages and avoid the disadvantages of using each mask individually. We aim to achieve separated sou… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
4

Relationship

2
7

Authors

Journals

citations
Cited by 31 publications
(28 citation statements)
references
References 29 publications
0
28
0
Order By: Relevance
“…Nowadays, deep learning solutions seem to be the new El Dorado for audio processing. Still, most studies deal with single-channel denoising / enhancement / separation algorithms [102,22,51,55,139,86]. More recently, multichannel processing solutions that employ deep learning [104,23], as well as robust ASR systems [60], have been proposed.…”
Section: Discussionmentioning
confidence: 99%
“…Nowadays, deep learning solutions seem to be the new El Dorado for audio processing. Still, most studies deal with single-channel denoising / enhancement / separation algorithms [102,22,51,55,139,86]. More recently, multichannel processing solutions that employ deep learning [104,23], as well as robust ASR systems [60], have been proposed.…”
Section: Discussionmentioning
confidence: 99%
“…From Eq. (16) and (17), it can be seen that IRM-based approaches could deliver a less distorted enhanced speech, while it could potentially involves much interference [87]. Wang and Wang [88] first introduced DNNs to perform IBM estimation for speech separation, and reported large performance improvement over non-DNN-based methods.…”
Section: B Masking-based Deep Enhancement Methodsmentioning
confidence: 99%
“…This conclusion was further supported by the work in [90], where the obtained results suggested that IRM achieves better ASR performance than IBM. Further, motivated by the advantages and disadvantages of IBM and IRM, Grais et al [87] combined the IBM-and the IRM-based enhanced (separated) speech by another neural network, to exploit the compensation between two approaches.…”
Section: B Masking-based Deep Enhancement Methodsmentioning
confidence: 99%
“…After obtaining the spectral weighting masks for dialogue content separation from multiple modules, these are combined with late fusion. The main task for the fusion is that it should improve the performance compared to the single best module, e.g., by locally selecting the best module based on a quality prediction [53], by using a DNN for combining the separation results [54,55], or by a weighted Table 1. Separation performance comparison of the proposed combined method (All), and the individual separation modules (Secs.…”
Section: Fusion Of Separation Modulesmentioning
confidence: 99%