Improved name recognition with meta-data dependent name networks

Maskey, Sameer; Bacchiani, Michiel; Roark, Brian; Sproat, Richard

doi:10.1109/icassp.2004.1326104

Cited by 6 publications

(4 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Aleksic et al [13] extend class-based LMs [4,5] by creating a user-dependent small LM for contact name recognition on voice commands, which is compiled 1 CTC is the abbreviation for Connectionist Temporal Classification. 2 GMM and HMM are short for Gaussian Mixture Model and Hidden Markov Model respectively.…”

Section: The Recognition Of Oov Words In End-to-end Asr Modelsmentioning

confidence: 99%

See 1 more Smart Citation

Emphasizing unseen words: New vocabulary acquisition for end-to-end speech recognition

Weber²,

Wermter³

2023

Neural Networks

View full text Add to dashboard Cite

Section: The Recognition Of Oov Words In End-to-end Asr Modelsmentioning

confidence: 99%

“…Since it takes substantial efforts to collect labeled OOV speech data for ASR model training, current approaches to tackle the OOV problem mainly involve a language model (LM) or post-processing, for instance, user-dependent language models [4,5], LM rescoring [6] and finite-state transducer lattice extension [7].…”

Section: Introductionmentioning

confidence: 99%

Emphasizing unseen words: New vocabulary acquisition for end-to-end speech recognition

Weber²,

Wermter³

2023

Neural Networks

View full text Add to dashboard Cite

“…For example, names in user's contact list are usually out-of-vocabulary (OOV) and are likely to have very low language model score, thereby making it difficult to accurately predict. These contextual terms can be personal, such as names in the user's contact [3,4], current location [5,6], and songs in the playlist [7]; topic-specific, such as medical domain [8]; or trending terms [9]. In all these scenarios the contextual information is not static and therefore needs to be dynamically incorporated into the language model during the inference stage.…”

Section: Introductionmentioning

confidence: 99%

“…In the case when meta-data are available, an entire class of terms can be biased [3,4,5,10,11]. The general idea is to replace every instance of phrases with its class-label to construct a class-based language model [12], and dynamically expand the decoding graph of the class-label into class instances provided in the context during inference.…”

Section: Introductionmentioning

confidence: 99%

Fast and Robust Unsupervised Contextual Biasing for Speech Recognition

Kang,

Zhou

2020

Preprint

View full text Add to dashboard Cite

Automatic speech recognition (ASR) system is becoming a ubiquitous technology. Although its accuracy is closing the gap with that of human level under certain settings, one area that can further improve is to incorporate user-specific information or context to bias its prediction. A common framework is to dynamically construct a small language model from the provided contextual mini corpus and interpolate its score with the main language model during the decoding process.Here we propose an alternative approach that does not entail explicit contextual language model. Instead, we derive the bias score for every word in the system vocabulary from the training corpus. The method is unique in that 1) it does not require meta-data or class-label annotation for the context or the training corpus. 2) The bias score is proportional to the word's log-probability, thus not only would it bias the provided context, but also robust against irrelevant context (e.g. user misspecified or in case where it is hard to quantify a tight scope).3) The bias score for the entire vocabulary is pre-determined during the training stage, thereby eliminating computationally expensive language model construction during inference.We show significant improvement in recognition accuracy when the relevant context is available. Additionally, we also demonstrate that the proposed method exhibits high tolerance to false-triggering errors in the presence of irrelevant context.

show abstract