It is generally believed that the transition probabilities in a hidden Markov model (HMM) have a limited role in the speech decoding process. In this paper, through a series of recognition experiments on Wall Street Journal (WSJ) read speech and SVitchboard (SVB) conversational telephone speech, we find that the HMM transition probabilities may be more important than we once thought. The experiments include: (1) setting or not setting all outgoing transition probabilities equal; (2) the introduction of word-final triphones and the re-estimation of their transition probabilities; (3) besides grammar factor and insertion penalty, the addition of a third decoding parameter called transition factor to scale the transition probability score during decoding. The results of the above three experiments enable us to improve the the word accuracy of the WSJ and SVB speech recognition task by 0.7% and 5.3% absolute respectively when compared to their baseline model in which all transition probabilities are simply set to 0.5.
This paper proposes a new hidden Makov model (HMM) which we call speaker-ensemble HMM (SE-HMM). An SE-HMM is a multi-path HMM in which each path is an HMM constructed from the training data of a different speaker. SE-HMM may be considered a form of template-based acoustic model where speaker-speci¿c acoustic templates are compressed statistically into speaker-speci¿c HMMs. However, one has the Àexibility of building SE-HMM at various level of compression: SE-HMM may be built for a triphone state, a triphone, a whole utterance, or other convenient phonetic units. As a result, SE-HMM contains more details than conventional HMM, but is much smaller than common templatebased acoustic models. Furthermore, the construction of SE-HMM is simple, and since it is still an HMM, its construction and computation is well supported by common HMM toolkits such as HTK. The proposed SE-HMM was evaluated on Resource Management and Wall Street Journal tasks, and it consistently gives better word recognition results than conventional HMM.Index Terms-detailed acoustic modeling, templatebased automatic speech recognition, speaker-ensemble acoustic model
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.