A one pass decoder design for large vocabulary recognition

Odell, JJ; Valtchev, V.; Woodland, Philip C.; Young, Steve

doi:10.3115/1075812.1075905

Cited by 78 publications

(55 citation statements)

References 14 publications

(10 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Efficient use of language models in speech recognizers [20,19,17] requires that the context dependent states representing different histories during search can be appropriately shared among multiple paths. This applies to both conventional back-off n-gram and feedforward NNLMs.…”

Section: History Context Clustering For Rnnlmsmentioning

confidence: 99%

Efficient lattice rescoring using recurrent neural network language models

Liu

Wang

Xie

et al. 2014

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

Recurrent neural network language models (RNNLM) have become an increasingly popular choice for state-of-the-art speech recognition systems due to their inherently strong generalization performance. As these models use a vector representation of complete history contexts, RNNLMs are normally used to rescore N-best lists. Motivated by their intrinsic characteristics, two novel lattice rescoring methods for RNNLMs are investigated in this paper. The first uses an n-gram style clustering of history contexts. The second approach directly exploits the distance measure between hidden history vectors. Both methods produced 1-best performance comparable with a 10k-best rescoring baseline RNNLM system on a large vocabulary conversational telephone speech recognition task. Significant lattice size compression of over 70% and consistent improvements after confusion network (CN) decoding were also obtained over the N-best rescoring approach.

show abstract

Section: History Context Clustering For Rnnlmsmentioning

confidence: 99%

Efficient lattice rescoring using recurrent neural network language models

Liu

Wang

Xie

et al. 2014

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…To deal with this, a number of different architectural approaches have evolved. For Viterbi decoding, the search space can either be constrained by maintaining multiple hypotheses in parallel [173,191,192] or it can be expanded dynamically as the search progresses [7,69,130,132]. Alternatively, a completely different approach can be taken where the breadth-first approach of the Viterbi algorithm is replaced by a depth-first search.…”

Section: Decoding and Lattice Generation 213mentioning

confidence: 99%

The Application of Hidden Markov Models in Speech Recognition

Gales

Young

2007

FNT in Signal Processing

427

155

View full text Add to dashboard Cite

Hidden Markov Models (HMMs) provide a simple and effective framework for modelling time-varying spectral vector sequences. As a consequence, almost all present day large vocabulary continuous speech recognition (LVCSR) systems are based on HMMs.Whereas the basic principles underlying HMM-based LVCSR are rather straightforward, the approximations and simplifying assumptions involved in a direct implementation of these principles would result in a system which has poor accuracy and unacceptable sensitivity to changes in operating environment. Thus, the practical application of HMMs in modern systems involves considerable sophistication.The aim of this review is first to present the core architecture of a HMM-based LVCSR system and then describe the various refinements which are needed to achieve state-of-the-art performance. These refinements include feature projection, improved covariance modelling, discriminative parameter estimation, adaptation and normalisation, noise compensation and multi-pass system combination. The review concludes with a case study of LVCSR for Broadcast News and Conversation transcription in order to illustrate the techniques described.

show abstract

“…However, for continuous speech recognition systems using higher order language models the linguistic state cannot be determined locally and the word boundaries are uncertain. Several solutions based on creating copies of the PPT for each unique linguistic context solve this problem [8,9,10], however these approaches create redundant sub-tree computations, the number of which correspond to the number of active linguistic contexts. A computation is redundant when a sub-tree instance is dominated by another instance of that sub-tree.…”

Section: Re-entrant Vs Non Re-entrant Treesmentioning

confidence: 99%

Evaluation of a language model using a clustered model backoff

Miller

Alleva²

Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96

View full text Add to dashboard Cite

In this paper, we describe and evaluate a language model using word classes automatically generated from a word clustering algorithm. Class based language models have been shown to be effective for rapid adaptation, training on small datasets, and reduced memory usage. In terms of model perplexity, prior work has shown diminished returns for class based language models constructed using very large training sets. This paper describes a method of using a class model as a backoff to a bigram model which produced significant benefits even when trained from a large text corpus. Tests results on the Whisper continuous speech recognition system show that for a given word error rate, the clustered bigram model uses 2/3 fewer parameters compared to a standard bigram model using unigram backoff.

show abstract

A one pass decoder design for large vocabulary recognition

Cited by 78 publications

References 14 publications

Efficient lattice rescoring using recurrent neural network language models

Efficient lattice rescoring using recurrent neural network language models

The Application of Hidden Markov Models in Speech Recognition

Evaluation of a language model using a clustered model backoff

Contact Info

Product

Resources

About