Neural network based spectral mask estimation for acoustic beamforming

Heymann, Jahn; Drude, Lukas; Haeb‐Umbach, Reinhold

doi:10.1109/icassp.2016.7471664

Cited by 413 publications

(348 citation statements)

References 14 publications

Supporting

Mentioning

346

Contrasting

Unclassified

Order By: Relevance

“…In the case of competing speakers, it is necessary that the network be given some additional information to identify the target. In previous works, the input of the network was either the magnitude spectrum of the mixture [7] or of the mixture processed with a simple delay-and-sum beamformer [13]. We propose to combine the magnitude spectra of the mixture observed at the omnidirectional channel W , x W (t, f ), and of the ouput of the HOA beamformer pointing toward the target,ŝ(t, f ) :…”

Section: Structure Of the Solutionmentioning

confidence: 99%

“…The application of deep neural networks (DNNs) to source separation has allowed for drastic improvement of ASR accuracy in real-world conditions [7]. DNNs were originally applied to single-channel inputs to derive a singlechannel filter, a.k.a.…”

Section: Introductionmentioning

confidence: 99%

“…However, in these two studies, the mask estimated by the DNN is still applied as a single-channel filter only. Recent approaches that derive DNN-based multichannel filters have proven very promising [7,13]. These include various beamformers derived from the speech and noise covariance matrices computed from the output mask [7,14] or an MWF derived by expectation-maximization [13].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Multichannel Speech Separation with Recurrent Neural Networks from High-Order Ambisonics Recordings

Perotin

Serizel

Guérin

2018

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Section: Structure Of the Solutionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Multichannel Speech Separation with Recurrent Neural Networks from High-Order Ambisonics Recordings

Perotin

Serizel

Guérin

2018

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

“…All these algorithms require the knowledge of either the direction of arrival (DOA) or the speech activity to compute the filters and are sensitive to signal mismatches [14] or detection errors [6]. Deep learning-based approaches have been proposed to estimate accurately these quantities through the prediction of a time-frequency (TF) mask [15,16,17] or of the spectrum of the desired signals [18]. Although often used in a multichannel context, most of these solutions use single-channel data as input of their deep neural networks (DNNs).…”

Section: Introductionmentioning

confidence: 99%

DNN-based Distributed Multichannel Mask Estimation for Speech Enhancement in Microphone Arrays

Furnon

Serizel

Illina

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Multichannel processing is widely used for speech enhancement but several limitations appear when trying to deploy these solutions in the real world. Distributed sensor arrays that consider several devices with a few microphones is a viable solution which allows for exploiting the multiple devices equipped with microphones that we are using in our everyday life. In this context, we propose to extend the distributed adaptive node-specific signal estimation approach to a neural network framework. At each node, a local filtering is performed to send one signal to the other nodes where a mask is estimated by a neural network in order to compute a global multichannel Wiener filter. In an array of two nodes, we show that this additional signal can be leveraged to predict the masks and leads to better speech enhancement performance than when the mask estimation relies only on the local signals.

show abstract

“…For robust ASR of a single (i.e., not overlapped) speaker, mask-based adaptive beamforming [1]- [3] has recently turned out to be highly effective. This approach was employed in the best-performing system [2], [4] in CHiME-3 [5] and CHiME-4.…”

Section: Introductionmentioning

confidence: 99%

Data-driven and physical model-based designs of probabilistic spatial dictionary for online meeting diarization and adaptive beamforming

Ito

Araki

Nakatani

2017

2017 25th European Signal Processing Conference (EUSIPCO)

View full text Add to dashboard Cite

Abstract-In this paper, we comparatively study alternative dictionary designs for recently proposed meeting diarization and adaptive beamforming based on a probabilistic spatial dictionary. This dictionary models the feature distribution for each possible direction of arrival (DOA) of speech signals and the feature distribution for background noise. The dictionary enables online DOA detection, which in turn enables online diarization. Here we describe data-driven and physical model-based designs of the dictionary. Experiments on a meeting dataset showed that a physical model-based dictionary gave a word error rate (WER) of 24.9 %, which is close to that for the best-performing data-driven dictionary (24.1 %). Therefore, the former has a significant advantage over the latter that it allows us to bypass the cumbersome measurement of training data without much degrading the performance of the automatic speech recognition (ASR).

show abstract

Neural network based spectral mask estimation for acoustic beamforming

Cited by 413 publications

References 14 publications

Multichannel Speech Separation with Recurrent Neural Networks from High-Order Ambisonics Recordings

Multichannel Speech Separation with Recurrent Neural Networks from High-Order Ambisonics Recordings

DNN-based Distributed Multichannel Mask Estimation for Speech Enhancement in Microphone Arrays

Data-driven and physical model-based designs of probabilistic spatial dictionary for online meeting diarization and adaptive beamforming

Contact Info

Product

Resources

About