Distortion-class modeling for robust speech recognition under GSM RPE-LTP coding

Huerta, Juan M.; Stern, Richard M.

doi:10.1016/s0167-6393(00)00055-8

Cited by 8 publications

(4 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The basic idea of the WAM method [4,5] is related to the observation described in Section 2 that not all segments of speech in a coded corpus are distorted to the same extent. As noted above, when the speech codec performs a short-term and long-term analysis of the speech signal, the level of distortion introduced by the long-term predictive analysis of the shortterm residual can be associated with the predictability of the speech signal.…”

Section: The Weighted Acoustic Modeling Methodsmentioning

confidence: 99%

“…In this work we focus on reducing the effect of this distortion on recognition accuracy by making use of the Weighted Acoustic Modeling method employing different distortion estimates. In the WAM technique [4,5] a set of acoustic models, each representing a certain distortion condition, is employed during decoding and its contribution to the overall likelihood is weighted by a running estimate of the distortion observed by each observation frame. In this work we estimate the instantaneous distortion in four different ways: using measured cepstral distortion, the long-term gain (adaptive codebook gain), the long-term predictability, and recoding sensitivity.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Instantaneous-distortion based weighted acoustic modeling for robust recognition of coded speech

Huerta¹,

Stern²

2000

6th International Conference on Spoken Language Processing (ICSLP 2000)

Self Cite

View full text Add to dashboard Cite

In this paper we apply the Weighted Acoustic Modeling (WAM) technique to the recognition of speech coded by the full-rate GSM codec or the FS-1016 CELP codec employing various estimates of instantaneous distortion. In the WAM method, separate hidden Markov models are developed for regions of speech that exhibit low levels of codec-induced distortion and for regions with higher levels of such distortion. At recognition time, the contributions of these models are mixed together with a weighting that is determined by estimating the instantaneous distortion. In this paper instantaneous distortion was estimated from the instantaneous cepstral distortion, the long-term gain parameter of the codec, the long-term predictability of the reconstructed signal, and measurements of recoding sensitivity. We observe that the use of the long-term gain parameter produces results that are similar to those obtained by use of cepstral distortion (which can only be obtained if the original cepstra are transmitted along with the speech signal) for the GSM codec. Overall, the effect of the degradation in error rate introduced by coding can be reduced by up to 55% with these techniques for GSM coding, and by up to 38% for the CELP coding.

show abstract

Section: The Weighted Acoustic Modeling Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Instantaneous-distortion based weighted acoustic modeling for robust recognition of coded speech

Huerta¹,

Stern²

2000

6th International Conference on Spoken Language Processing (ICSLP 2000)

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, the purpose of noise reduction is sometimes as a preprocessor to, e.g., a speech recognition algorithm. Here, the word error rate increases when the SNR decreases [29,30], but on the other hand, the algorithms are also sensible to distortion of the speech signal [31,32]. In such cases, it might, therefore, be optimal with another relationship between SNR and speech distortion than the one having the best perceptual performance.…”

Section: Simulationsmentioning

confidence: 99%

Single-channel noise reduction using unified joint diagonalization and optimal filtering

Nørholm

Benesty

Jensen

et al. 2014

EURASIP J. Adv. Signal Process.

View full text Add to dashboard Cite

In this paper, the important problem of single-channel noise reduction is treated from a new perspective. The problem is posed as a filtering problem based on joint diagonalization of the covariance matrices of the desired and noise signals. More specifically, the eigenvectors from the joint diagonalization corresponding to the least significant eigenvalues are used to form a filter, which effectively estimates the noise when applied to the observed signal. This estimate is then subtracted from the observed signal to form an estimate of the desired signal, i.e., the speech signal. In doing this, we consider two cases, where, respectively, no distortion and distortion are incurred on the desired signal. The former can be achieved when the covariance matrix of the desired signal is rank deficient, which is the case, for example, for voiced speech. In the latter case, the covariance matrix of the desired signal is full rank, as is the case, for example, in unvoiced speech. Here, the amount of distortion incurred is controlled via a simple, integer parameter, and the more distortion allowed, the higher the output signal-to-noise ratio (SNR). Simulations demonstrate the properties of the two solutions. In the distortionless case, the proposed filter achieves only a slightly worse output SNR, compared to the Wiener filter, along with no signal distortion. Moreover, when distortion is allowed, it is possible to achieve higher output SNRs compared to the Wiener filter. Alternatively, when a lower output SNR is accepted, a filter with less signal distortion than the Wiener filter can be constructed.

show abstract

“…Investigations have been carried out to determine the influence of speech coding on the performance of speech recognition systems [1,2] and to improve recognition performance in such situations [3,4,5]. Most of this work has a focus on the GSM full-rate coding scheme that was introduced as first coding technique in GSM mobile networks.…”

Section: Introductionmentioning

confidence: 99%

The influence of speech coding on recognition performance in telecommunication networks

Hirsch¹

2002

7th International Conference on Spoken Language Processing (ICSLP 2002)

View full text Add to dashboard Cite

The influence of encoding and decoding speech on automatic speech recognition is investigated in this paper with respect to applications in today's telecommunication networks. The deterioration of recognition performance is presented for several coding schemes in GSM and future mobile networks. The extraction of acoustic features for the recognition is done with the already standardized ETSI frontend and with the advanced robust frontend whose standardization is almost finished. The Aurora2 experiment for recognizing the noisy TIDigits is taken as experimental basis. Finally recognition results are compared to results of subjective listening tests that have been performed for the characterisation of these speech coding schemes.

show abstract

Distortion-class modeling for robust speech recognition under GSM RPE-LTP coding

Cited by 8 publications

References 27 publications

Instantaneous-distortion based weighted acoustic modeling for robust recognition of coded speech

Instantaneous-distortion based weighted acoustic modeling for robust recognition of coded speech

Single-channel noise reduction using unified joint diagonalization and optimal filtering

The influence of speech coding on recognition performance in telecommunication networks

Contact Info

Product

Resources

About