2017
DOI: 10.1016/j.csl.2016.11.005
|View full text |Cite
|
Sign up to set email alerts
|

An analysis of environment, microphone and data simulation mismatches in robust speech recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
184
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
8
2

Relationship

2
8

Authors

Journals

citations
Cited by 292 publications
(185 citation statements)
references
References 57 publications
(88 reference statements)
1
184
0
Order By: Relevance
“…The proposed multi-span AM was evaluated by training systems on CHiME4 [28] and AMI [29] using HTK 3.5.1 and PyHTK [30,31]. In the results reported here, the multi-span feature vector p of the concatenated input streams is fed into a simple feed forward DNN with 4 hidden layers each having 512 output nodes and ReLU activation function.…”
Section: Methodsmentioning
confidence: 99%
“…The proposed multi-span AM was evaluated by training systems on CHiME4 [28] and AMI [29] using HTK 3.5.1 and PyHTK [30,31]. In the results reported here, the multi-span feature vector p of the concatenated input streams is fed into a simple feed forward DNN with 4 hidden layers each having 512 output nodes and ReLU activation function.…”
Section: Methodsmentioning
confidence: 99%
“…These five types of noise are also used in the noise-depend evaluation. For noise-independent evaluation, we use 5 different noises from different datasets: pedestrian, cafe, street noises from CHiME-4 [26] dataset and factory2, tank (m109) from NOISEX-92. These noises are all highly non-stationary, which makes speech enhancement be a challenging task.…”
Section: A Experimental Setupmentioning
confidence: 99%
“…This estimator cannot be computed in practice when the clean data is unknown, but it provides a lower bound on the word error rate (WER) achievable via uncertainty decoding. For real CHiME-3 data, we computed the OU using "pseudo-clean" features obtained by least-squares subband filtering of the noisy signals using the signal recorded by a close-talk microphone as a reference, as described in [29].…”
Section: Oracle Uncertainty (Ou) Estimatormentioning
confidence: 99%