In recent years, many alternative models have been proposed to address some of the shortcomings of the hidden Markov model, currently the most popular approach to speech recognition. In particular, a variety of models that could be broadly classi ed as segment models have been described for representing a variable-length sequence of observation vectors in speech recognition applications. Since there are many aspects in common between these approaches, including the general recognition and training problems, it is useful to consider them in a uni ed framework. Thus, the goal of this paper will be to describe a general stochastic model that encompasses most of the models proposed in the literature, pointing out similarities of the models in terms of correlation and parameter tying assumptions, and drawing analogies between segment models and hidden Markov models. In addition, we summarize experimental results assessing di erent modeling assumptions, and point out remaining open questions.
This paper presents our recent effort that aims at improving our Arabic Broadcast News (BN) recognition system by using thousands of hours of un-transcribed Arabic audio in the way of unsupervised training. Unsupervised training is first carried out on the 1,900-hour English Topic Detection and Tracking (TDT) data and is compared with the lightly-supervised training method that we have used for the DARPA EARS evaluations. The comparison shows that unsupervised training produces a 21.7% relative reduction in word error rate (WER), which is comparable to the gain obtained with light supervision methods. The same unsupervised training strategy carried out on a similar amount of Arabic BN data produces an 11.6% relative gain. The gain, though considerable, is substantially smaller than what is observed on the English data. Our initial work towards understanding the reasons for this difference is also described.
This paper describes a general formalism for integrating two or more speech recognition technologies, which could be developed at different research sites using different recognition strategies. In this formalism, one system uses the N-best search strategy to generate a list of candidate sentences; the list is rescorred by other systems; and the different scores axe combined to optimize performance. Specifically, we report on combining the BU system based on stochastic segment models and the BBN system based on hidden Markov models. In addition to facilitating integration of different systems, the N-best approach results in a large reduction in computation for word recognition using the stochastic segment model
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.