Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-2549
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Training of Neural Mask-Based Beamforming

Abstract: We present an unsupervised training approach for a neural network-based mask estimator in an acoustic beamforming application. The network is trained to maximize a likelihood criterion derived from a spatial mixture model of the observations. It is trained from scratch without requiring any parallel data consisting of degraded input and clean training targets. Thus, training can be carried out on real recordings of noisy speech rather than simulated ones. In contrast to previous work on unsupervised training o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
17
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 16 publications
(17 citation statements)
references
References 29 publications
0
17
0
Order By: Relevance
“…In contrast, the latter system is better tuned towards performance relying on a state of the art spatial clustering model and reporting competitive WERs. In [177] we presented a different take on unsupervised mask estimation: the likelihood under the assumption that the data follows a cACGMM is used as a maximization criterion to train a neural network which just provides the initialization to a single EM-step of the cACGMM parameter estimation process. That way, the network is encouraged to provide a mask as initialization which is close to an optimal initialization, thus leading to a higher likelihood.…”
Section: Unsupervised Training Using Multi-channel Featuresmentioning
confidence: 99%
“…In contrast, the latter system is better tuned towards performance relying on a state of the art spatial clustering model and reporting competitive WERs. In [177] we presented a different take on unsupervised mask estimation: the likelihood under the assumption that the data follows a cACGMM is used as a maximization criterion to train a neural network which just provides the initialization to a single EM-step of the cACGMM parameter estimation process. That way, the network is encouraged to provide a mask as initialization which is close to an optimal initialization, thus leading to a higher likelihood.…”
Section: Unsupervised Training Using Multi-channel Featuresmentioning
confidence: 99%
“…Seetharaman et al [13] designed a loss function weighted by a confidence measure of the estimated references. Drude et al [15] also proposed a novel approach that directly trains a separation network from the cACGMM likelihood. They applied the method to noisy speech recordings and reported that the performance of automatic speech recognition was superior to that of their previous approach mentioned above.…”
Section: Unsupervised Training Of Neural Source Separationmentioning
confidence: 99%
“…Unsupervised training for neural source separation using multichannel mixture signals has recently gained a lot of attention [12][13][14][15]. One approach is to generate supervised data by using multichannel separation methods [12][13][14].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Consequently, supervised source separation using neural networks relies on the availability of paired mixtureclean data in the training set and cannot be used when such paired datasets are not available or expensive to collect. To relax these constraints, a few recent papers use other forms of information like the spatial separation between the sources in a multi-microphone setting, to train the networks for unsupervised source separation [7,8,9,10]. However, these constraints continue to impose restrictions on singlechannel source separation, where such secondary forms of information about the sources are not available.…”
Section: Introductionmentioning
confidence: 99%