2017
DOI: 10.1016/j.csl.2016.11.003
|View full text |Cite
|
Sign up to set email alerts
|

Speech enhancement for robust automatic speech recognition: Evaluation using a baseline system and instrumental measures

Abstract: Automatic speech recognition in everyday environments must be robust to significant levels of reverberation and noise. One strategy to achieve such robustness is multi-microphone speech enhancement. In this study, we present results of an evaluation of different speech enhancement pipelines using a state-of-the-art ASR system for a wide range of reverberation and noise conditions. The evaluation exploits the recently released ACE Challenge database which includes measured multichannel acoustic impulse response… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
1

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 29 publications
(18 citation statements)
references
References 11 publications
0
17
1
Order By: Relevance
“…Please note that the correlation score between the CEG and WER for multi-condition training tends to be smaller than that for clean-condition training because the number of the same scores of WER corresponding to different scores of CEG for multi-condition training is larger, which leads to worse correlation statistics. Unlike the conclusions in [10,11], we observe that the correlation scores between the STOI and WER tend to be larger than those between the PESQ and WER for multi-condition training and the contrary conclusion could be drawn for clean-condition training, which may indicate that the recognition performance depends more largely on speech quality for clean-condition training and on speech intelligibility for multi-condition training. Besides, it is noted that the acoustic confidence measure and the proposed CEG have positive correlations with WER, on the contrary, PESQ and STOI have negative correlations with WER.…”
Section: Experimental Settingcontrasting
confidence: 99%
See 2 more Smart Citations
“…Please note that the correlation score between the CEG and WER for multi-condition training tends to be smaller than that for clean-condition training because the number of the same scores of WER corresponding to different scores of CEG for multi-condition training is larger, which leads to worse correlation statistics. Unlike the conclusions in [10,11], we observe that the correlation scores between the STOI and WER tend to be larger than those between the PESQ and WER for multi-condition training and the contrary conclusion could be drawn for clean-condition training, which may indicate that the recognition performance depends more largely on speech quality for clean-condition training and on speech intelligibility for multi-condition training. Besides, it is noted that the acoustic confidence measure and the proposed CEG have positive correlations with WER, on the contrary, PESQ and STOI have negative correlations with WER.…”
Section: Experimental Settingcontrasting
confidence: 99%
“…For example, resulting speech enhanced by OM-LSA could improve recognition accuracy for cleancondition training regardless of its worse STOI shown in Table 2. Accordingly, the conclusion in [10,11] that the correlation coefficient between the WER and STOI is higher than other distortion measures (e.g., PESQ) is not accurate and reliable enough. Some researches [22,32] suggested by the conclusion in [10,11] designed a speech enhancement frontend to especially improve STOI and thus achieve better ASR performance.…”
Section: Comparison Of Evaluation Accuracymentioning
confidence: 99%
See 1 more Smart Citation
“…Recent studies have reported a positive correlation between objective intelligibility scores and ASR performance [27,32]. In Table 2, we show the STOI and PESQ scores of enhanced speech processed by RLSE1 and RLSE2 at SNR levels of0 and 5 dB.…”
Section: Resultsmentioning
confidence: 97%
“…It has been reported that when the goal is to improve the ASR performance, ideal binary mask (IBM) is more suitable than ideal ratio mask (IRM) or directly mapping [27] to be used to design the SE system. Therefore, we implement an IBM-based SE system in this study.…”
Section: Ideal Binary Mask-based Se Systemmentioning
confidence: 99%