2020
DOI: 10.48550/arxiv.2005.01677
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Fast and Robust Unsupervised Contextual Biasing for Speech Recognition

Young Mo Kang,
Yingbo Zhou

Abstract: Automatic speech recognition (ASR) system is becoming a ubiquitous technology. Although its accuracy is closing the gap with that of human level under certain settings, one area that can further improve is to incorporate user-specific information or context to bias its prediction. A common framework is to dynamically construct a small language model from the provided contextual mini corpus and interpolate its score with the main language model during the decoding process.Here we propose an alternative approach… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 14 publications
0
6
0
Order By: Relevance
“…Since there is no context vector in RNN-T, the query is derived from the encoder hidden state instead, as shown in Eqn. (13).…”
Section: Tcpgen In Rnn-tmentioning
confidence: 99%
See 1 more Smart Citation
“…Since there is no context vector in RNN-T, the query is derived from the encoder hidden state instead, as shown in Eqn. (13).…”
Section: Tcpgen In Rnn-tmentioning
confidence: 99%
“…Contextual biasing, which integrates contextual knowledge into an automatic speech recognition (ASR) system, has become increasingly important to many applications [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18]. Contextual knowledge is often represented by a list (referred to as a biasing list) of words or phrases (referred to as biasing words) that are likely to appear in an utterance in a given context.…”
Section: Introductionmentioning
confidence: 99%
“…Contextual speech recognition aims at addressing the long-tail word problem in end-to-end ASR systems by incorporating contextual knowledge, and has become increasingly important in many applications [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16]. Informative contextual knowledge can be extracted from text-based resources such as a users' contact book, playlist and the ontology in a dialogue system, or from other modalities such as presentation slides or objects in a scene.…”
Section: Introductionmentioning
confidence: 99%
“…Incorporating dynamic contextual knowledge into end-toend ASR systems is a challenging problem. Dedicated contextual biasing approaches have been developed, such as shallow fusion (SF) with a special weighted finite-state transducer (WFST), a language model (LM) adapted for contextual knowledge [1][2][3][14][15][16]; attention-based deep context approaches [4][5][6][7][8], and also deep biasing (DB) with a prefix-tree for improved efficiency when dealing with large biasing lists [9,10].…”
Section: Introductionmentioning
confidence: 99%
“…Contextual information has been researched by using fusion methods with trained language models (LM) [7,8,9], but also there are works dealing with contextual biasing which utilises a specific context such as named entity or personalised contacts. Majority of works require training an additional representation such as bias encoder [10,11,12], bias LM [13], class based LM [14,15,16], or additional data augmentation of a named entity with Text-to-speech [17]. Our work is similar to [18] that the bias information is encoded to trie structure.…”
Section: Introductionmentioning
confidence: 99%