2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2017
DOI: 10.1109/asru.2017.8268940
|View full text |Cite
|
Sign up to set email alerts
|

On lattice generation for large vocabulary speech recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 27 publications
0
4
0
Order By: Relevance
“…All previous biasing work, for both conventional and E2E models [2,4,6], combines scores from the contextual LM and the base model (ASR LM or E2E model) on the word or subword lattice, a concept known as on-the-fly lattice rescoring [8]. E2E models are decoded with tight beam thresholds resulting in a much smaller number of path hypotheses compared to conventional models.…”
Section: Biasing Before Beam Pruningmentioning
confidence: 99%
“…All previous biasing work, for both conventional and E2E models [2,4,6], combines scores from the contextual LM and the base model (ASR LM or E2E model) on the word or subword lattice, a concept known as on-the-fly lattice rescoring [8]. E2E models are decoded with tight beam thresholds resulting in a much smaller number of path hypotheses compared to conventional models.…”
Section: Biasing Before Beam Pruningmentioning
confidence: 99%
“…These results motivate future work to improve lattice generation [18,19], particularly in E2E ASR systems. Our current research also explores open-vocabulary decoding in a WFST framework, in which novel words may be included in a lattice and derived phrase alternatives.…”
Section: Discussionmentioning
confidence: 70%
“…Using a limited beam size H, the search generates an N -best list with a computational cost of O(H). However, N -best lists are insufficient for some downstream applications or post-processing stages, such as (1) language model rescoring [10][11][12]; (2) downstream processing of ASR output (e.g., translation [13] and keyword spotting [14,15]); (3) confusion network generation [16,17]; and (4) sequence discriminative training [18][19][20].…”
Section: Introductionmentioning
confidence: 99%
“…Hypothesis merging (or path recombination) is the main difference between regular beam search and lattice-based search. For conventional methods, Rybach et al divided merging procedures into two categories [12]: the phone-pair approach [10,21] and the N -best history approach [22][23][24]. However, neither are suitable for end-to-end models because they require additional n-gram language models and HMM-based acoustic models to generate lattices.…”
Section: Introductionmentioning
confidence: 99%