2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2015
DOI: 10.1109/asru.2015.7404828
|View full text |Cite
|
Sign up to set email alerts
|

The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

4
178
1

Year Published

2017
2017
2020
2020

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 212 publications
(183 citation statements)
references
References 25 publications
4
178
1
Order By: Relevance
“…Table 5 illustrates the performance of the DNN-based time-invariant generalized eigenvalue (GEV) beamformer proposed by Heymann et al (2015). This beamformer is similar to the mask-based MVDR beamformer of Yoshioka et al (2015) mentioned in Section 3.2.1, except that the time-frequency mask from which the multichannel statistics of speech and noise are computed is estimated via a DNN instead of a clustering technique. It is followed by a time-invariant blind analytic normalization (BAN) filter which rescales the beamformer output to ensure unit gain for the speaker signal.…”
Section: Dnn-based Beamforming and Separationmentioning
confidence: 99%
See 4 more Smart Citations
“…Table 5 illustrates the performance of the DNN-based time-invariant generalized eigenvalue (GEV) beamformer proposed by Heymann et al (2015). This beamformer is similar to the mask-based MVDR beamformer of Yoshioka et al (2015) mentioned in Section 3.2.1, except that the time-frequency mask from which the multichannel statistics of speech and noise are computed is estimated via a DNN instead of a clustering technique. It is followed by a time-invariant blind analytic normalization (BAN) filter which rescales the beamformer output to ensure unit gain for the speaker signal.…”
Section: Dnn-based Beamforming and Separationmentioning
confidence: 99%
“…A few challenge entries also employed multichannel dereverberation techniques based on time-domain linear prediction (Yoshioka et al, 2010) or interchannel coherence-based time-frequency masking (Schwarz and Kellermann, 2014). As expected, these techniques improved performance on real data but made a smaller difference or even degraded performance on simulated data due to the fact that it did not include any early reflection or reverberation (Yoshioka et al, 2015;Barfuss et al, 2015;Pang and Zhu, 2015).…”
Section: 1 Beamforming and Post-filteringmentioning
confidence: 99%
See 3 more Smart Citations