2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2015
DOI: 10.1109/asru.2015.7404829
|View full text |Cite
|
Sign up to set email alerts
|

BLSTM supported GEV beamformer front-end for the 3RD CHiME challenge

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
88
0

Year Published

2016
2016
2019
2019

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 125 publications
(103 citation statements)
references
References 12 publications
1
88
0
Order By: Relevance
“…In the following, we do not discuss DNN post-filters, which provided a limited improvement or degradation on both real and simulated data (Hori et al, 2015;Sivasankaran et al, 2015), and we focus on multichannel DNN-based enhancement instead. Table 5 illustrates the performance of the DNN-based time-invariant generalized eigenvalue (GEV) beamformer proposed by Heymann et al (2015). This beamformer is similar to the mask-based MVDR beamformer of Yoshioka et al (2015) mentioned in Section 3.2.1, except that the time-frequency mask from which the multichannel statistics of speech and noise are computed is estimated via a DNN instead of a clustering technique.…”
Section: Dnn-based Beamforming and Separationmentioning
confidence: 99%
See 2 more Smart Citations
“…In the following, we do not discuss DNN post-filters, which provided a limited improvement or degradation on both real and simulated data (Hori et al, 2015;Sivasankaran et al, 2015), and we focus on multichannel DNN-based enhancement instead. Table 5 illustrates the performance of the DNN-based time-invariant generalized eigenvalue (GEV) beamformer proposed by Heymann et al (2015). This beamformer is similar to the mask-based MVDR beamformer of Yoshioka et al (2015) mentioned in Section 3.2.1, except that the time-frequency mask from which the multichannel statistics of speech and noise are computed is estimated via a DNN instead of a clustering technique.…”
Section: Dnn-based Beamforming and Separationmentioning
confidence: 99%
“…Table 8: WER (%) achieved on noisy data using various acoustic models trained on noisy real and simulated data (all channels) without sMBR (Yoshioka et al, 2015 It must be noted that, with the exception of Vu et al (2015), all challenge entrants trained GMM-HMM and DNN-HMM acoustic models on real and simulated data. Heymann et al (2015) found that discarding real data and training a GMM-HMM acoustic model on simulated data only increases the WER by 3% and 4% relative on real development and test data, respectively. This minor degradation is mostly due to the smaller size of the training set and it proves without doubt that acoustic models are able to leverage simulated data to learn about real data.…”
Section: Acoustic Modelingmentioning
confidence: 99%
See 1 more Smart Citation
“…Most systems show a large spread of performance across recording sessions (Figure 10). In particular, Hori et al (2015), Sivasankaran et al (2015), Tran et al (unpublished) and Heymann et al (2015), have overall WERs in the 9-12% range, but have WERs of 20-30% on at least one session. It is possible that these bad session scores are due to a lack of robustness to the microphone errors, given that channel errors have themselves been shown to be concentrated within sessions.…”
Section: Characterising System Performancementioning
confidence: 99%
“…For example, Yoshioka et al (2015) apply a time-frequency mask when estimating the steering vector. Heymann et al (2015) employ a DNN to perform the necessary speech and noise covariance estimates. Other teams have employed a conventional delay and sum beamformer (e.g., Sivasankaran et al, 2015;Hori et al, 2015;Prudnikov et al, 2015).…”
Section: Target Enhancementmentioning
confidence: 99%