2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2015
DOI: 10.1109/asru.2015.7404832
|View full text |Cite
|
Sign up to set email alerts
|

A CHiME-3 challenge system: Long-term acoustic features for noise robust automatic speech recognition

Abstract: The paper describes an automatic speech recognition (ASR) system for the 3rd CHiME challenge that addresses noisy acoustic scenes within public environments. The proposed system includes a multi-channel speech enhancement front-end including a microphone channel failure detection method that is based on cross-comparing the modulation spectra of speech to detect erroneous microphone recordings. The main focus of the submission is the investigation of the amplitude modulation filter bank (AMFB) as a method to ex… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(12 citation statements)
references
References 24 publications
0
12
0
Order By: Relevance
“…The improvement brought by these techniques appears to be quite correlated between real and simulated data. Other authors also found this result to hold for auditory-motivated features such as Gabor filterbank (GBFB) (Martinez and Meyer, 2015) and amplitude modulation filter bank (AMFB) (Moritz et al, 2015) and feature transformation/augmentation methods such as vocal tract length normalization (VTLN) (Tachioka et al, 2015) or i-vectors (Pang and Zhu, 2015;Prudnikov et al, 2015), provided that these features and methods are applied to noisy data or data enhanced using the robust beamforming or source separation techniques listed in Section 3.2. Interestingly, Tachioka et al (2015) found VTLN to yield consistent results on real vs. simulated data when using GEV beamforming as a pre-processing step but opposite results when using MVDR beamforming instead.…”
Section: Robust Features and Feature Normalizationmentioning
confidence: 84%
“…The improvement brought by these techniques appears to be quite correlated between real and simulated data. Other authors also found this result to hold for auditory-motivated features such as Gabor filterbank (GBFB) (Martinez and Meyer, 2015) and amplitude modulation filter bank (AMFB) (Moritz et al, 2015) and feature transformation/augmentation methods such as vocal tract length normalization (VTLN) (Tachioka et al, 2015) or i-vectors (Pang and Zhu, 2015;Prudnikov et al, 2015), provided that these features and methods are applied to noisy data or data enhanced using the robust beamforming or source separation techniques listed in Section 3.2. Interestingly, Tachioka et al (2015) found VTLN to yield consistent results on real vs. simulated data when using GEV beamforming as a pre-processing step but opposite results when using MVDR beamforming instead.…”
Section: Robust Features and Feature Normalizationmentioning
confidence: 84%
“…[20,35] use a Gammatone filterbank that has broader filter tails and has been shown to provide noise robustness. Four systems have used amplitude modulation-based features either by applying a discrete cosine transform (DCT) on the filterbank envelopes [37]; employing a 2D Gabor filter bank [22]; or tracking amplitude modulation (AM) in filterbands using a non-linear Teager energy operator [19].…”
Section: Feature Designmentioning
confidence: 99%
“…Several teams have combined multiple architectures (Yoshioka et al, 2015;Du et al, 2015;Zhuang et al, 2015). Performance benefits of the various architectures remain unclear, however it is notable that some of the best scoring systems including Hori et al (2015), Sivasankaran et al (2015) and Moritz et al (2015), have used the baseline DNN configuration.…”
Section: Statistical Modellingmentioning
confidence: 99%
“…Examples include, Ma et al (2015) and Du et al (2015) that use a Gammatone filterbank that has broader filter tails and has been shown to perform well in previous robust ASR evaluations. Four systems used amplitude modulationbased features either by applying a discrete cosine transform (DCT) on the filterbank envelopes (Castro Martinez and Meyer, 2015); employing a 2D Gabor filter bank (Moritz et al, 2015); or tracking amplitude modulation (AM) in filterbands using a non-linear Teager energy operator (Hori et al, 2015).…”
Section: Feature Designmentioning
confidence: 99%
See 1 more Smart Citation