2019
DOI: 10.48550/arxiv.1907.01030
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

LSTM Language Models for LVCSR in First-Pass Decoding and Lattice-Rescoring

Abstract: LSTM based language models are an important part of modern LVCSR systems as they significantly improve performance over traditional backoff language models. Incorporating them efficiently into decoding has been notoriously difficult. In this paper we present an approach based on a combination of onepass decoding and lattice rescoring. We perform decoding with the LSTM-LM in the first pass but recombine hypothesis that share the last two words, afterwards we rescore the resulting lattice. We run our systems on … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
20
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
1

Relationship

4
4

Authors

Journals

citations
Cited by 10 publications
(22 citation statements)
references
References 24 publications
0
20
0
Order By: Relevance
“…The optimal tradeoff between number of epochs in combination with multi-stage phonetic training has still not been explored and is left for future work. For recognition we use 4-gram [33] and LSTM language models [34]. We also include a second pass rescoring with a Transformer (Trafo) LM for one of our experiments [35].…”
Section: Experimental Settingmentioning
confidence: 99%
“…The optimal tradeoff between number of epochs in combination with multi-stage phonetic training has still not been explored and is left for future work. For recognition we use 4-gram [33] and LSTM language models [34]. We also include a second pass rescoring with a Transformer (Trafo) LM for one of our experiments [35].…”
Section: Experimental Settingmentioning
confidence: 99%
“…Neural network LMs are shown to bring consistent improvements over count-based LMs [14,1,2]. These neural LMs are then used either in second-pass lattice rescoring or first pass decoding for ASR [3,4,5,15,16]. To mitigate the problem of having to traverse over the full vocabulary in the softmax normalization, various sampling-based training criteria are proposed and investigated [6,7,8,9,10,11,12].…”
Section: Related Workmentioning
confidence: 99%
“…Nowadays, word-based neural language models (LMs) consistently give better perplexities than count-based language models [1,2], and are commonly used for second-pass rescoring or first-pass decoding of automatic speech recognition (ASR) outputs [3,4,5]. One challenge to train such LMs, especially when the vocabulary size is large, is the traversal over the full vocabulary in the softmax normalization.…”
Section: Introductionmentioning
confidence: 99%
“…Neural LMs are commonly used in second-pass rescoring [1,2,22,23] or first-pass decoding [3] in ASR systems. While for conventional research-oriented datasets like Switchboard the word-level vocabulary size is several dozens of thousands, for larger systems, especially commercially available systems, the vocabulary size can often go up to several hundred thousand.…”
Section: Related Workmentioning
confidence: 99%
“…Enjoying the benefit of large amounts of text-only training data, language models (LMs) remain an important part of the modern automatic speech recognition (ASR) pipeline [1,2,3]. However, the large quantity of available data is a double-edged sword, posing real challenges in training.…”
Section: Introductionmentioning
confidence: 99%