Decision tree state tying based on segmental clustering for acoustic modeling

Reichl, Wolfgang; Chou, Wu

doi:10.1109/icassp.1998.675386

Cited by 31 publications

(13 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The 61-dimensional feature vectors (I energy coeff + 60 KL-trans formed features) are used as features to build triphone HMM mod els] on the WSJ SI-84 training set using the decision-tree state tying algorithm described in [12]. We point out that the first and second order derivative of this feature set are not used.…”

Section: Experiments and Resultsmentioning

confidence: 99%

Towards knowledge-based features for HMM based large vocabulary automatic speech recognition

Launay¹,

Siohan²,

Surendran³

et al. 2002

IEEE International Conference on Acoustics Speech and Signal Processing

View full text Add to dashboard Cite

ABSTRACfThis paper describes an attempt to design a knowledge-baSed large vocabulary speech recognition system. Our motivation is to replace features based on the short·term spectra, such as Mel-frequency cep straI coefficients (MFCC), by features that explicitly represent some of the distinctive features of the speech signal. However, rather than att empting to cOmpute acoustic correlates of these distinctive fea tures, we have engineered an approach where neural networks are trained to map short-tenn spectral features to the posterior probabil ity of some distinctive features. These probabilities are then used as features in a large vocabulary tied-state HMM-based recognizer. Experimental results on the Wall Street Journal Task show that such a system, while not outperfonning a MFCC-based system, generates very different error patterns. After combining the results of a base line MFCC system with the results of several systems based on the proposed approach, we were able to obtain reductions in word error rates of 19% and 10 % on the 5K and 20K tasks respectively over our best MfCC-based systems.

show abstract

Section: Experiments and Resultsmentioning

confidence: 99%

Towards knowledge-based features for HMM based large vocabulary automatic speech recognition

Launay¹,

Siohan²,

Surendran³

et al. 2002

IEEE International Conference on Acoustics Speech and Signal Processing

View full text Add to dashboard Cite

show abstract

“…The speech recognizer is based on continuous density HMMs and the Bell Labs recognition engine [27]. The acoustic units are state-clustered triphone models, having three emitting states and a left-to-right topology [24].…”

Section: Resultsmentioning

confidence: 99%

“…Usually a mixture of Gaussian densities is used for each state of the phoneme (or triphone) specific HMM [24]. In this case, the reduced distribution for the reliable part of the feature vector is the marginal determined by integrating over all unreliable components:…”

Section: Soft-feature Error Concealmentmentioning

confidence: 99%

An error-protected speech recognition system for wireless communications

Weerackody¹,

Reichl²,

Potamianos³

2002

IEEE Trans. Wireless Commun.

Self Cite

View full text Add to dashboard Cite

uture wireless multimedia terminals will have a variety of applications that require speech recognition capabilities.In this paper, we consider a robust distributed speech recognition system where representative parameters of the speech signal are extracted at the wireless terminal and transmitted to a centralized automatic speech recognition (ASR) server. We propose several unequal error protection schemes for the ASR bit stream and demonstrate the satisfactory performance of these schemes for typical wireless cellular channels. In addition, a "soft-feature" error concealment strategy is introduced at the ASR server that uses "soft-outputs" from the channel decoder to compute the marginal distribution of only the reliable features during likelihood computation at the speech recognizer. This soft-feature error concealment techniques reduces the ASR error rate by up to four times for certain channels. Also considered is a channel decoding technique with source information that improves ASR performance. uture wireless multimedia terminals will have a variety of applications that require speech recognition capabilities. In this paper, we consider a robust distributed speech recognition system where representative parameters of the speech signal are extracted at the wireless terminal and transmitted to a centralized automatic speech recognition (ASR) server. We propose several unequal error protection schemes for the ASR bit stream and demonstrate the satisfactory performance of these schemes for typical wireless cellular channels. In addition, a "soft-feature" error concealment strategy is introduced at the ASR server that uses "soft-outputs" from the channel decoder to compute the marginal distribution of only the reliable features during likelihood computation at the speech recognizer. This soft-feature error concealment techniques reduces the ASR error rate by up to four times for certain channels. Also considered is a channel decoding technique with source information that improves ASR performance. F

show abstract

“…In particular, we have proposed a new segmental two-level clustering algorithm that combines the phonetic decision-tree-based state-tying with agglomerative clustering to improve model coverage on rarely seen acoustic phonetic events in the training data [31,3,33]. We have also devised a unified maximum likelihood framework to incorporate generalized phonetic and non-phonetic features in decision-treebased state-tying.…”

Section: Acoustic Modelingmentioning

confidence: 99%