An overview of decoding techniques for large vocabulary continuous speech recognition

Aubert, Xavier L.

doi:10.1006/csla.2001.0185

Cited by 89 publications

(53 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Time-asynchronous decoding pursues a depthfirst strategy, in which the best hypotheses are explored forward in time before being compared to their competitors. The distinction between time-synchronous and time-asynchronous decoding is not absolute [12]. In practice, time-synchronous decoders limit the number of hypotheses under examination at any given point using a process called beam pruning, which eliminates unpromising hypotheses from consideration.…”

Section: Recognitionmentioning

confidence: 99%

“…LVCSR is an incredibly complex process. In [12], a "trilogy" of factors is identified as being responsible for the bestperforming LVCSR systems: combination of multiple algorithms, clever design cooperative with the hardware, careful parameter tuning. If any of these factors goes awry during the process of integrating the ASR and indexing system, sub-optimal performance could result.…”

Section: Strategies For Combining Asr and Irmentioning

confidence: 99%

“…If a resource used by the IR component could be more effectively exploited by the ASR component, systems should be designed to use it there. In particular, the trend towards producing lattice representations and also to integrating long range syntactic dependencies into the decoding process, as mentioned by [12], is an important one.…”

Section: Strategies For Combining Asr and Irmentioning

confidence: 99%

See 2 more Smart Citations

Automatic Summarization

Larson

2012

FNT in Information Retrieval

View full text Add to dashboard Cite

Section: Recognitionmentioning

confidence: 99%

Section: Strategies For Combining Asr and Irmentioning

confidence: 99%

Section: Strategies For Combining Asr and Irmentioning

confidence: 99%

See 1 more Smart Citation

Automatic Summarization

Larson

2012

FNT in Information Retrieval

View full text Add to dashboard Cite

“…Nevertheless, the complexity of acoustic and language models used in speech recognition tasks still imposes growing requirements for the efficiency and accuracy of LVCSR decoders, and fosters the development of new approaches and techniques such as, e.g. cross-word acoustic models and longspan language models, already resulted in the development of several solutions for the speech-decoding problem [1,2,5,6,8,10,21,22].…”

Section: Introductionmentioning

confidence: 99%

Novel LVCSR Decoder Based on Perfect Hash Automata and Tuple Structures – SPREAD –

Rojc¹,

Kačič²

2014

IJACSA

View full text Add to dashboard Cite

Abstract-The paper presents the novel design of a one-pass large vocabulary continuous-speech recognition decoder engine, named SPREAD. The decoder is based on a time-synchronous beam-search approach, including statically expanded cross-word triphone contexts. An approach using efficient tuple structures is proposed for the construction of the complete search-network. The foremost benefits are the important space savings and higher processing speed, and the compact and reduced size of the tuple structure, especially when exploiting the structure of the key. In this way, the time needed to load the ASR search-network into the memory is also significantly reduced. Further, the paper proposes and presents the complete methodology for compiling general ASR knowledge sources into a tuple structures. Additionally, the beam search is enhanced with the novel implementation of a bigram language model Look-Ahead technique, by using tuple structures and a caching scheme. The SPREAD LVCSR decoder is based on a token-passing algorithm, capable of restricting its search-space by several types of token pruning. By using the presented language model Look-Ahead technique, it is possible to increase the number of tokens that can be pruned without decoding precision loss.

show abstract

“…Typically these knowledge sources are represented in the form of hidden Markov models (HMM), pronunciation lexica, and N-gram language models. The means for combining these knowledge sources and efficient decoding of the acoustic input is a demanding task and a range of optimisation techniques and heuristics are employed to achieve lower computational and memory requirements with minimal sacrifice to recognition accuracy [1]. In this paper we present the "Juicer" decoding software that has been developed at IDIAP.…”

Section: Introductionmentioning

confidence: 99%

Juicer: A Weighted Finite-State Transducer Speech Decoder

Moore

Dines

Magimai.-Doss

et al. 2006

Machine Learning for Multimodal Interaction

View full text Add to dashboard Cite

Abstract. A major component in the development of any speech recognition system is the decoder. As task complexities and, consequently, system complexities have continued to increase the decoding problem has become an increasingly significant component in the overall speech recognition system development effort, with efficient decoder design contributing to significantly improve the trade-off between decoding time and search errors. In this paper we present the "Juicer" (from transducer ) large vocabulary continuous speech recognition (LVCSR) decoder based on weighted finite-State transducer (WFST). We begin with a discussion of the need for open source, state-of-the-art decoding software in LVCSR research and how this lead to the development of Juicer, followed by a brief overview of decoding techniques and major issues in decoder design. We present Juicer and its major features, emphasising its potential not only as a critical component in the development of LVCSR systems, but also as an important research tool in itself, being based around the flexible WFST paradigm. We also provide results of benchmarking tests that have been carried out to date, demonstrating that in many respects Juicer, while still in its early development, is already achieving state-of-the-art. These benchmarking tests serve to not only demonstrate the utility of Juicer in its present state, but are also being used to guide future development, hence, we conclude with a brief discussion of some of the extensions that are currently under way or being considered for Juicer.

show abstract

An overview of decoding techniques for large vocabulary continuous speech recognition

Cited by 89 publications

References 38 publications

Automatic Summarization

Automatic Summarization

Novel LVCSR Decoder Based on Perfect Hash Automata and Tuple Structures – SPREAD –

Juicer: A Weighted Finite-State Transducer Speech Decoder

Contact Info

Product

Resources

About