ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9054550
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Neural Mask Estimator for Generalized Eigen-Value Beamforming Based Asr

Abstract: The state-of-art methods for acoustic beamforming in multi-channel ASR are based on a neural mask estimator that predicts the presence of speech and noise. These models are trained using a paired corpus of clean and noisy recordings (teacher model). In this paper, we attempt to move away from the requirements of having supervised clean recordings for training the mask estimator. The models based on signal enhancement and beamforming using multi-channel linear prediction serve as the required mask estimate. In … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
7
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2

Relationship

5
2

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 15 publications
(25 reference statements)
0
7
0
Order By: Relevance
“…The experiments are performed on REVERB challenge (Kinoshita et al, 2013) and CHiME-3 (Barker et al, 2015) datasets. For the baseline model, we use WPE enhancement (Nakatani et al, 2010) along with unsupervised GEV beamforming (Kumar et al, 2020). This signal is processed with filter-bank energy features (denoted as BF-FBANK).…”
Section: Experiments and Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The experiments are performed on REVERB challenge (Kinoshita et al, 2013) and CHiME-3 (Barker et al, 2015) datasets. For the baseline model, we use WPE enhancement (Nakatani et al, 2010) along with unsupervised GEV beamforming (Kumar et al, 2020). This signal is processed with filter-bank energy features (denoted as BF-FBANK).…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…One common approach to suppress reverberation is to combine all channels by beamforming (Anguera et al, 2007) before feeding it to the ASR system. Recently, unsupervised neural mask estimator for generalized eigen-value beamforming is proposed (Kumar et al, 2020). Traditional pre-possessing also includes the weighted prediction error (WPE) (Nakatani et al, 2010) based dereverberation along with the beamforming in most state-of-art far-field ASR systems.…”
Section: Introductionmentioning
confidence: 99%
“…The experiments are performed on REVERB challenge [13] and CHiME-3 [14] datasets. For the baseline model, we use WPE enhancement [8] along with unsupervised GEV beamforming [7]. This signal is processed with filterbank energy features (denoted as BF-FBANK).…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…One common approach to suppress reverberation is to combine all channels by beamforming [6] before feeding it to the ASR system. Recently, unsupervised neural mask estimator for generalized eigen-value beamforming is proposed [7].…”
Section: Introductionmentioning
confidence: 99%
“…A common approach in multi-channel recording conditions is to use a weighted and delayed combination of the multiple channels using the technique called beamforming [4]. The current state-of-art approaches to beamforming use a neural mask estimator [5,6]. The speech and noise mask estimations are used to derive the power spectral density of the source and interfering signals for eigen value based beamforming [7].…”
Section: Introductionmentioning
confidence: 99%