2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2014
DOI: 10.1109/icassp.2014.6854657
|View full text |Cite
|
Sign up to set email alerts
|

Fusion of multiple uncertainty estimators and propagators for noise robust ASR

Abstract: Uncertainty decoding has been successfully used for speech recognition in highly nonstationary noise environments. Yet, accurate estimation of the uncertainty on the denoised signals and propagation to the features remain difficult. In this work, we propose to fuse the uncertainty estimates obtained from different uncertainty estimators and propagators by linear combination. The fusion coefficients are optimized by minimizing a measure of divergence with oracle estimates on development data. Using the Kullback… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
8
0

Year Published

2015
2015
2017
2017

Publication Types

Select...
3
3

Relationship

3
3

Authors

Journals

citations
Cited by 13 publications
(8 citation statements)
references
References 22 publications
0
8
0
Order By: Relevance
“…During acoustic model scoring, the uncertainty decoding framework estimates the uncertainty (or variance) of speech distortion in the input features [2][3][4] in each time frame and modifies the acoustic scores accordingly. The uncertainty can be computed directly in the ASR feature domain [1,[5][6][7][8][9][10] or propagated from the spectral domain to the feature domain [11][12][13][14][15][16][17], under the assumption that it can be represented by Gaussian distribution. For Gaussian mixture model (GMM) based acoustic models, the expectation of the acoustic scores over this distribution can then be computed in closed form by adding the variance of the uncertainty to that of every Gaussian component [2][3][4].…”
Section: Introductionmentioning
confidence: 99%
“…During acoustic model scoring, the uncertainty decoding framework estimates the uncertainty (or variance) of speech distortion in the input features [2][3][4] in each time frame and modifies the acoustic scores accordingly. The uncertainty can be computed directly in the ASR feature domain [1,[5][6][7][8][9][10] or propagated from the spectral domain to the feature domain [11][12][13][14][15][16][17], under the assumption that it can be represented by Gaussian distribution. For Gaussian mixture model (GMM) based acoustic models, the expectation of the acoustic scores over this distribution can then be computed in closed form by adding the variance of the uncertainty to that of every Gaussian component [2][3][4].…”
Section: Introductionmentioning
confidence: 99%
“…Conversely, the features are unreliable when their uncertainty tends to be high. The uncertainty is first computed in the spectral domain [4] then propagated into the feature domain. Because of the non-linear transform applied to the input spectral domain, propagation requires approximate methods such as Vector Taylor series (VTS) [5], moment matching [6] or unscented transform [7].…”
Section: Introductionmentioning
confidence: 99%
“…To overcome this, the estimated uncertainty can be rescaled by a linear transformation [4,[8][9][10]. In the past [4,10], the scaling factors were optimized such that the recaled uncertainty estimates are close to the oracle estimates irrespectively of the resulting state hypotheses. This can be considered as a sub-optimal approach because the same scaling factors are applied to the correct state hypothesis and to the competing state hypotheses.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The distribution of speech distortions is typically approximated as a Gaussian from which the uncertainty or variance of speech distortions is derived. The uncertainty can be computed directly in the ASR feature domain [2,[6][7][8][9][10] or propagated from the spectral domain to the feature domain [1,[11][12][13][14][15].…”
Section: Introductionmentioning
confidence: 99%