End-to-End Open Vocabulary Keyword Search

Yusuf, Bolaji; Gok, Alican; Gundogdu, Batuhan; Saraçlar, Murat

doi:10.21437/interspeech.2021-1399

Cited by 7 publications

(4 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The embedding neural network can be any network architecture although only FSMN is used in this paper. Therefore, to future improve result, we will train more architecture e.g [12][13] [14] to better performance. Besides this, the overlapping property may be able to used for acoustic pattern analysis more physically and we will try to find relation through vibration normal modes and acoustic states.…”

Section: Discussionmentioning

confidence: 99%

Boosting Tail Neural Network for Realtime Custom Keyword Spotting

Xue¹,

Shen²,

Li³

2022

Preprint

View full text Add to dashboard Cite

In this paper, we propose a Boosting Tail Neural Network (BTNN) for improving the performance of Realtime Custom Keyword Spotting (RCKS) that is still an industrial challenge for demanding powerful classification ability with limited computation resources. Inspired by Brain Science that a brain is only partly activated for a nerve simulation and numerous machine learning algorithms are developed to use a batch of weak classifiers to resolve arduous problems, which are often proved to be effective. We show that this method is helpful to the RCKS problem. The proposed approach achieve better performances in terms of wakeup rate and false alarm.In our experiments compared with those traditional algorithms that use only one strong classifier, it gets 18% relative improvement. We also point out that this approach may be promising in future ASR exploration.

show abstract

Section: Discussionmentioning

confidence: 99%

Boosting Tail Neural Network for Realtime Custom Keyword Spotting

Xue¹,

Shen²,

Li³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…More recently, ASR-free KWS methods have sought to eschew the ASR and its concomitant complexities [6][7][8][9][10][11][12]. Instead of relying on the output of an ASR system 1 , a neural network is trained in an end-to-end (E2E) fashion to locate written queries in large spoken archives.…”

Section: Introductionmentioning

confidence: 99%

End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations

Yusuf¹,

Černocký

Saraçlar³

2023

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

End-to-end (E2E) keyword search (KWS) has emerged as an alternative and complimentary approach to conventional keyword search which depends on the output of automatic speech recognition (ASR) systems. While E2E methods greatly simplify the KWS pipeline, they generally have worse performance than their ASR-based counterparts, which can benefit from pretraining with untranscribed data. In this work, we propose a method for pretraining E2E KWS systems with untranscribed data, which involves using acoustic unit discovery (AUD) to obtain discrete units for untranscribed data and then learning to locate sequences of such units in the speech. We conduct experiments across languages and AUD systems: we show that finetuning such a model significantly outperforms a model trained from scratch, and the performance improvements are generally correlated with the quality of the AUD system used for pretraining.

show abstract

“…We design two-stream networks to reliably embed linguistic representations of speech and text sequences within a common latent space. Since the audio-text joint latent space places linguistically similar embeddings close to each other [7,8], it is possible to distinguish keywords from other speech inputs. Based on these representations, our proposed method decides whether the input speech contains a keyword or not, by using a cross-attention mechanism [9,10,11].…”

Section: Introductionmentioning

confidence: 99%

Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting

Shin¹,

Han²,

Kim³

et al. 2022

Preprint

View full text Add to dashboard Cite

In this paper, we propose a novel end-to-end user-defined keyword spotting method that utilizes linguistically corresponding patterns between speech and text sequences. Unlike previous approaches requiring speech keyword enrollment, our method compares input queries with an enrolled text keyword sequence. To place the audio and text representations within a common latent space, we adopt an attention-based cross-modal matching approach that is trained in an end-to-end manner with monotonic matching loss and keyword classification loss. We also utilize a de-noising loss for the acoustic embedding network to improve robustness in noisy environments. Additionally, we introduce the LibriPhrase dataset, a new short-phrase dataset based on LibriSpeech for efficiently training keyword spotting models. Our proposed method achieves competitive results on various evaluation sets compared to other single-modal and cross-modal baselines.

show abstract

End-to-End Open Vocabulary Keyword Search

Cited by 7 publications

References 0 publications

Boosting Tail Neural Network for Realtime Custom Keyword Spotting

Boosting Tail Neural Network for Realtime Custom Keyword Spotting

End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations

Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting

Contact Info

Product

Resources

About