Frame-specific statistical features for speaker independent speech recognition

Bocchieri, Enrico; Doddington, George R.

doi:10.1109/tassp.1986.1164911

Cited by 33 publications

(16 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A significant aspect of our research is that it represents the effort toward the automatic definition of speechdependent acoustic parameters, which are subject to statistical optimization rather than relying on heuristic construction. Along this line, we note an earlier work as a representative of the nonparametric (speech-frame based) approach to this problem [3]. Our own earlier parametric (HMMstate based) approach [7], [24] has been extended in this study from the previous level of MFCC to the present level of log-channel energy computed from DFT's, a step closer toward the most primitive form of the data as speech waveform.…”

Section: Summary and Discussionmentioning

confidence: 88%

“…As described above, the static features are obtained by a linear transformation of andimensional input space for the MFB log channel energies, represented by the vector , to a transformed -dimensional feature space according to (1). Instead of taking the temporal difference of the transformed static features fixed a priori in THMM-1, the dynamic feature vector at frame in THMM-2 is constructed as additional state-dependent, trainable linear combinations of the static features stretching over the interval frames forward and frames backward according to (3) where is the th scalar weighting coefficient associated with the th mixture residing in the Markov state . (Note that in this THMM-2, is trainable, in contrast to THMM-1 where weights are prefixed).…”

Section: B Construction Of State-dependent Joint Transforms For Statmentioning

confidence: 99%

See 1 more Smart Citation

HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features

Chengalvarayan

Deng

1997

IEEE Trans. Speech Audio Process.

View full text Add to dashboard Cite

In the study reported in this paper, we investigate interactions of front-end feature extraction and back-end classification techniques in hidden Markov model-based (HMMbased) speech recognition. The proposed model focuses on dimensionality reduction of the mel-warped discrete fourier transform (DFT) feature space subject to maximal preservation of speech classification information, and aims at finding an optimal linear transformation on the mel-warped DFT according to the minimum classification error (MCE) criterion. This linear transformation, along with the HMM parameters, are automatically trained using the gradient descent method to minimize a measure of overall empirical error counts. A further generalization of the model allows integration of the discriminatively derived state-dependent transformation with the construction of dynamic feature parameters. Experimental results show that state-dependent transformation on mel-warped DFT features is superior in performance to the mel-frequency cepstral coefficients (MFCC's). An error rate reduction of 15% is obtained on a standard 39-class TIMIT phone classification task, in comparison with the conventional MCE-trained HMM using MFCC's that have not been subject to optimization during training.

show abstract

Section: Summary and Discussionmentioning

confidence: 88%

Section: B Construction Of State-dependent Joint Transforms For Statmentioning

confidence: 99%

HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features

Chengalvarayan

Deng

1997

IEEE Trans. Speech Audio Process.

View full text Add to dashboard Cite

show abstract

“…As part of a continuing trend to better characterize temporal variations in the signal, higher order time derivatives of signal measurements (Doddington 1989;Bocchieri and Doddington 1986;Furui 1986) were added to the signal model. The absolute measurements previously discussed can be thought of as zero th order derivatives.…”

Section: Differentiationmentioning

confidence: 99%

Front end analysis of speech recognition: a review

Anusuya¹,

Katti²

2011

Int J Speech Technol

102

View full text Add to dashboard Cite

Automatic speech recognition (ASR) has made great strides with the development of digital signal processing hardware and software. But despite of all these advances, machines can not match the performance of their human counterparts in terms of accuracy and speed, especially in case of speaker independent speech recognition. So, today significant portion of speech recognition research is focused on speaker independent speech recognition problem. Before recognition, speech processing has to be carried out to get a feature vectors of the signal. So, front end analysis plays a important role. The reasons are its wide range of applications, and limitations of available techniques of speech recognition. So, in this report we briefly discuss the different aspects of front end analysis of speech recognition including sound characteristics, feature extraction techniques, spectral representations of the speech signal etc. We have also discussed the various advantages and disadvantages of each feature extraction technique, along with the suitability of each method to particular application.

show abstract

“…The minimum error is obtained by choosing the ( ) smallest (zero in our case) eigenvalues and their corresponding eigenvectors as the ones to discard [8]. Since the number of largest (nonzero) eigenvalues is limited by the number ( ) when , the dimension of the subspace spanned by the eigenvectors corresponding to the largest eigenvalues can be extended up to ( ).…”

Section: Principal Component Analysismentioning

confidence: 99%

The common vector approach and its relation to principal component analysis

Gülmezoğlu

Dzhafarov

Barkana

2001

IEEE Trans. Speech Audio Process.

117

View full text Add to dashboard Cite

The main point of the paper is to show the close relation between the nonzero principal components and the difference subspace together with the complementary close relation between the zero principal components and the common vector. A common vector representing each word-class is obtained from the eigenvectors of the covariance matrix of its own word-class; that is, the common vector is in the direction of a linear combination of the eigenvectors corresponding to the zero eigenvalues of the covariance matrix. The methods that use the nonzero principal components for recognition purposes suggest the elimination of all the features that are in the direction of the eigenvectors corresponding to the smallest eigenvalues (including the zero eigenvalues) of the covariance matrix whereas the common vector approach suggests the elimination of all the features that are in the direction of the eigenvectors corresponding to the largest, all nonzero eigenvalues of the covariance matrix.Index Terms-Common vector approach, speech recognition, subspace methods.

show abstract

Frame-specific statistical features for speaker independent speech recognition

Cited by 33 publications

References 11 publications

HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features

HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features

Front end analysis of speech recognition: a review

The common vector approach and its relation to principal component analysis

Contact Info

Product

Resources

About