2023
DOI: 10.1109/taslp.2022.3224286
|View full text |Cite
|
Sign up to set email alerts
|

Minimising Biasing Word Errors for Contextual ASR With the Tree-Constrained Pointer Generator

Abstract: Contextual knowledge is essential for reducing speech recognition errors on high-valued long-tail words. This paper proposes a novel tree-constrained pointer generator (TCP-Gen) component that enables end-to-end ASR models to bias towards a list of long-tail words obtained using external contextual information. With only a small overhead in memory use and computation cost, TCPGen can structure thousands of biasing words efficiently into a symbolic prefix-tree, and creates a neural shortcut between the tree and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 64 publications
0
4
0
Order By: Relevance
“…With similar levels of R-WER reduction, AED achieved a higher reduction in WER. As analysed in [50], TCPGen produced a much more confident prediction of P gen with AED than N-T, where the main reductions in overall WER were attributed to the reduction in R-WER. The improvements using GNN indicate that the GNN encoding improved the prediction of P gen , which was more beneficial for the overall WER in AED.…”
Section: B Librispeech 960-hour Resultsmentioning
confidence: 94%
See 1 more Smart Citation
“…With similar levels of R-WER reduction, AED achieved a higher reduction in WER. As analysed in [50], TCPGen produced a much more confident prediction of P gen with AED than N-T, where the main reductions in overall WER were attributed to the reduction in R-WER. The improvements using GNN indicate that the GNN encoding improved the prediction of P gen , which was more beneficial for the overall WER in AED.…”
Section: B Librispeech 960-hour Resultsmentioning
confidence: 94%
“…where P mdl (Y ) is the probability from the end-to-end system, P src (Y ) is the probability of the source domain LM and P tgt (Y ) is the target domain LM probability. Extending this idea to contextual biasing with TCPGen [50], BLMD can be applied as shown in Eqn. (7).…”
Section: A Biasing-driven Lm Discounting (Blmd) For Tcpgenmentioning
confidence: 99%
“…The overfitting issue during LSTM training can be mitigated with the use of dropout for LSTM. An rnnDrop approach is proposed in [24]. for use in speech recognition problems.…”
Section: Methodsmentioning
confidence: 99%
“…Training-time adaptation. The second category consists of approaches that modify the ASR model during training to incorporate contextual information, often relying on attention-based mechanisms (Jain et al, 2020;Chang et al, 2021;Huber et al, 2021;Sathyendra et al, 2022;Sun et al, 2023a;Munkhdalai et al, 2023;Chan et al, 2023). Such a direct integration of contextual information is usually more accurate than shallow fusion, but it comes with the added overhead of retraining the ASR model for every new dictionary to be integrated.…”
Section: Related Work and Backgroundmentioning
confidence: 99%