Absfract-We propose, implement, and evaluate a class of nonstationary-state hidden Markov models (HMM's) having each state associated with a distinct polynomial regression function of time plus white Gaussian noise. The model represents the transitional acoustic trajectories of speech in a parametric manner, and includes the standard stationary-state HMM as a special, degenerated case. We develop an efficient dynamic programming technique which includes the state sojourn time as an optimization variable, in conjunction with a state-dependent orthogonal polynomial regression method, for estimating the model parameters. Experiments on fitting models to speech data and on limited-vocabulary speech recognition demonstrate consistent superiority of these nonstationary-state HMM's over the traditional stationary-state HMM's.
In this study, we make a major extension of the nonstationary-state or trended hidden Markov model (HMM) from the previous single-trend formulation [2], [3] to the current mixture-trended one. This extension is motivated by the observation of wide variations in the trajectories of the acoustic data in fluent, speaker-independent speech associated with a fixed underlying linguistic unit. It is also motivated by potential use of mixtures of trend functions to characterize heterogeneous time-varying data generated from distinctive sources such as the speech signals collected from different microphones or from different telephone channels. We show how HMM's with mixtures of trend functions can be implemented simply in the already well-established single-trend HMM framework via the device of expanding each state into a set of parallel states. Details of a maximum-likelihood-based (ML-based) algorithm are given for estimating state-dependent mixture trajectory parameters in the model. Experimental results on the task of classifying speaker-independent vowels excised from the TIMIT data base demonstrate consistent performance improvement using phonemic mixture-trended HMM's over their single-trend counterpart.
In this study we extend the nonstationary-state (trended) HMM from the single-trend formulation [2] to the mixture-trend one. This extension is motivated by the observation of wide variations in the trajectories of the acoustic data in fluent, speaker-independent speech associated with a given underlying linguistic unit. We show how HMMs with mixtures of trend functions can be implemented simply in the already well established singly trended HMM Lamework via the device of expanding each state into a set of parallel states. Details of a maximum-likelihood based algorithm are given for estimating state-dependent mixture trajectory parameters in the model. Experimental results on the task of classifying speaker-independent vowels excised from TIMIT database demonstrate consistent performance improvement using phonemic mixture-trended HMMs over their singly-trended counterpart.
The formulation of the hidden Markov model (HMM) has been successfully used in automatic speech recognition for almost two decades. In the standard formulation, the individual states in the HMM are each associated with a generally distinct but stationary stochastic process. This makes the standard HMM inadequate for representing the nonstationary property of the many speech segments intended to be described by the HMM-state statistics. A generalized HMM has been developed to overcome this inadequacy by introducing state-dependent polynomial regression functions on time that serve as the time-varying means in the HMM’s Gaussian output distributions [e.g., L. Deng, Signal Process. 27, 65–78 (1992)]. Recently, Aksmanovic and Deng extended the above model so that the state-dependent nonstationary process contains multiple tracks of the polynomial functions. This new parametric class of nonstationary-state HMMs has been implemented and evaluated. Experiments on fitting models to speech data, on limited-vocabulary word recognition, and on phonetic classification demonstrated advantages of the nonstationary-state HMMs over the traditional stationary-state HMMs. Details of the model implementation and of the experimental results will be described. In particular, the focus will be on comparisons between uses of single-track and multiple-track regression functions defined within the HMM states, and on comparisons among uses of varying orders of the state-dependent polynomial regression functions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.