2021
DOI: 10.48550/arxiv.2110.09245
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Efficient Sequence Training of Attention Models using Approximative Recombination

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(6 citation statements)
references
References 0 publications
0
6
0
Order By: Relevance
“…However, once external language models are included in the training phase, sequence normalization needs to be included explicitly, leading to MMI sequence discriminative training. This has been exploited as a further approach to combine E2E models with external language models trained on text-only data during the training phase itself [128], [129], [130].…”
Section: B Training With External Language Modelsmentioning
confidence: 99%
“…However, once external language models are included in the training phase, sequence normalization needs to be included explicitly, leading to MMI sequence discriminative training. This has been exploited as a further approach to combine E2E models with external language models trained on text-only data during the training phase itself [128], [129], [130].…”
Section: B Training With External Language Modelsmentioning
confidence: 99%
“…However, once external language models are included in the training phase, sequence normalization needs to be included explicitly, leading to MMI sequence discriminative training. This has been exploited as a further approach to combine E2E models with external language models trained on text-only data already in the training phase [98], [99], [100].…”
Section: B Training With External Language Modelsmentioning
confidence: 99%
“…Finally, Optimal Completion Distillation (OCD) [112] looks to minimize the total edit distance using an efficient dynamic programming algorithm. Finally, another body of research with sequence training introduce a separate external language model at training time, [113], which can also be done efficiently via approximate lattice recombination [99] and also lattice-free approaches [100].…”
Section: Minimum Error Trainingmentioning
confidence: 99%
“…More recently, work in [46] applied LM fusion and internal LM estimation during MWE training of an AED model to improve the N-best approximation. Work in [45] exploited a lattice structure in place of the N-best list to calculate the expected word errors. For RNN-T models, [42] applied the same N-best approximation as in AED to calculate the expected errors.…”
Section: B Mwe Training For End-to-end Asr Modelsmentioning
confidence: 99%
“…The sum is performed over all possible sequences and P (Y |X) is the probability of a specific sequence calculated from the end-to-end ASR model output. As it is intractable to enumerate over all possible sequences and calculate their probabilities, a common practice which has been widely adopted in MWE training for end-toend ASR systems [40]- [45] is to use the N-best hypotheses to approximate the expected word errors, as shown in Eqn. (16).…”
Section: Mbwe Training For Tcpgenmentioning
confidence: 99%