To achieve reasonable accuracy in large vocabulary speech recognition systems, it is important to use detailed acoustic models together with good long span language models. For example, in the Wall Street Journal (WSJ) task both cross-word triphones and a trigram language model are necessary to achieve state-of-the-art performance. However, when using these models, the size of a pre-compiled recognition network can make a standard Viterbi search infeasible and hence, either multiple-pass or asynchronous stack decoding schemes are typically used. In tl:fis paper, we show that timesynchronous one-pass decoding using cross-word triphones and a trigram language model can be implemented using a dynamically built tree-structured network. This approach avoids the compromises inherent in using fast-matches or preliminary passes and is relatively efficient in implementation. It was included in the HTK large vocabulary speech recognition system used for the 1993 ARPA WSJ evaluation and experimental results are presented for that task.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.