Incremental language models for speech recognition using finite-state transducers

Dolfing, Hans; Hetherington, I. Lee

doi:10.1109/asru.2001.1034620

Cited by 30 publications

(8 citation statements)

References 5 publications

(10 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The most commonly used asynchronous method is A * [8,9,10]. • in the field of WFST-based speech recognition, several algorithms have been proposed in order to reduce time and memory problem: [11], [12] propose on the fly composition algorithms while [13] propose to factor the language models into smaller components.…”

Section: Classical Approachesmentioning

confidence: 99%

Ant colony algorithm applied to automatic speech recognition graph decoding

Lecouteux¹,

Schwab²

2015

Interspeech 2015

View full text Add to dashboard Cite

In this article we propose an original approach that allows the decoding of Automatic Speech Recognition Graphs by using a constructive algorithm based on ant colonies. In classical approaches, when a graph is decoded with higher order language models; the algorithm must expand the graph in order to develop each new observed n-gram. This extension process increases the computation time and memory consumption. We propose to use an ant colony algorithm in order to explore ASR graphs with a new language model, without the necessity of expanding it. We first present results based on the TED English corpus where 2-grams graph are decoded with a 4grams language model. Then, we show that our approach performs better than a conventional Viterbi algorithm when computing time is constrained and allows a highly threaded decoding process with a single graph and a strict control of computation time and memory consumption.

show abstract

Section: Classical Approachesmentioning

confidence: 99%

Ant colony algorithm applied to automatic speech recognition graph decoding

Lecouteux¹,

Schwab²

2015

Interspeech 2015

View full text Add to dashboard Cite

show abstract

“…The WFST Guni represents a uni-gram language model and WFST G tri/uni represents a tri-gram model divided by the uni-gram probability. This scheme as studied in [3] allows for the inclusion of some static language model information and the addition of this look-ahead information can improve the search performance.…”

Section: Wfst Combinations Evaluatedmentioning

confidence: 99%

“…A drawback of this WFST approach is access to original knowledge sources is lost once the final network has been composed and optimized. On-the-fly composition and optimization algorithms have been developed by others [3,4,5,6] as a method of increasing flexibility within the WFST paradigm. However, one disadvantage with such on-line algorithms is some of the optimization powers available in the static equivalents are sacrificed.…”

Section: Introductionmentioning

confidence: 99%

Implementation and evaluation of fast on-the-fly WFST composition algorithms

et al. 2008

View full text Add to dashboard Cite

“…One of the possible solutions to this problem is to perform on-the-fly transducer composition during decoding. Acoustical, phonetic and lexical resources may still be composed and optimised off-line, while the language model transducer is locally, dynamically composed at run time [3,19,9]. By using this approach, we can avoid composing part of the search space which is not traversed by any hypotheses.…”

Section: Future Developmentmentioning

confidence: 99%

Juicer: A Weighted Finite-State Transducer Speech Decoder

Moore

Dines

Magimai.-Doss

et al. 2006

Machine Learning for Multimodal Interaction

View full text Add to dashboard Cite

Abstract. A major component in the development of any speech recognition system is the decoder. As task complexities and, consequently, system complexities have continued to increase the decoding problem has become an increasingly significant component in the overall speech recognition system development effort, with efficient decoder design contributing to significantly improve the trade-off between decoding time and search errors. In this paper we present the "Juicer" (from transducer ) large vocabulary continuous speech recognition (LVCSR) decoder based on weighted finite-State transducer (WFST). We begin with a discussion of the need for open source, state-of-the-art decoding software in LVCSR research and how this lead to the development of Juicer, followed by a brief overview of decoding techniques and major issues in decoder design. We present Juicer and its major features, emphasising its potential not only as a critical component in the development of LVCSR systems, but also as an important research tool in itself, being based around the flexible WFST paradigm. We also provide results of benchmarking tests that have been carried out to date, demonstrating that in many respects Juicer, while still in its early development, is already achieving state-of-the-art. These benchmarking tests serve to not only demonstrate the utility of Juicer in its present state, but are also being used to guide future development, hence, we conclude with a brief discussion of some of the extensions that are currently under way or being considered for Juicer.

show abstract

Incremental language models for speech recognition using finite-state transducers

Cited by 30 publications

References 5 publications

Ant colony algorithm applied to automatic speech recognition graph decoding

Ant colony algorithm applied to automatic speech recognition graph decoding

Implementation and evaluation of fast on-the-fly WFST composition algorithms

Juicer: A Weighted Finite-State Transducer Speech Decoder

Contact Info

Product

Resources

About