Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-2613
|View full text |Cite
|
Sign up to set email alerts
|

End-to-End Keyword Search Based on Attention and Energy Scorer for Low Resource Languages

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 14 publications
0
8
0
Order By: Relevance
“…The integration of an attention mechanism (including a variant called multi-head attention [144]) in (primarily) Seq2Seq acoustic models in order to focus on the keyword(s) of interest has successfully been accomplished by a number of works, e.g., [26], [32], [60], [68], [133], [143], [145]. These works find that incorporating attention provides KWS performance gains with respect to counterpart Seq2Seq models without attention.…”
Section: ) the Attention Mechanismmentioning
confidence: 99%
“…The integration of an attention mechanism (including a variant called multi-head attention [144]) in (primarily) Seq2Seq acoustic models in order to focus on the keyword(s) of interest has successfully been accomplished by a number of works, e.g., [26], [32], [60], [68], [133], [143], [145]. These works find that incorporating attention provides KWS performance gains with respect to counterpart Seq2Seq models without attention.…”
Section: ) the Attention Mechanismmentioning
confidence: 99%
“…Deep learning and, in particular, end-to-end systems were also recently investigated to solve the STD problem directly. In this direction, several end-to-end ASR-free approaches for STD were proposed [13,[34][35][36]. In addition to exploring neural end-to-end approaches, deep learning is extensively used to extract representations (embeddings) of audio documents and query terms that facilitate the search [20,21,23,25].…”
Section: Spoken Term Detectionmentioning
confidence: 99%
“…This program focused on building fully automatic and noise-robust speech recognition and search systems in a very limited amount of time (e.g., one week) and with limited amount of training data. The languages addressed in that program were low-resourced, such as Cantonese, Pashto, Tagalog, Turkish, Vietnamese, Swahili, Tamil and so on, and significant research has been carried out [13,61,[147][148][149][150][151][152][153][154][155][156][157][158][159].…”
Section: Comparison With Previous Std International Evaluationsmentioning
confidence: 99%
See 1 more Smart Citation
“…However, the approaches mentioned above have two disadvantages: (1) they are ASR-free and designed for a small number of the keywords of interest, and (2) they neglect the timestamps of keywords. Some people work on ASRfree multi-keyword detection [23][24][25], but the timestamps of keywords are still neglected. Nonetheless, in some practical applications, the timestamps of a large amount of keywords are still required.…”
Section: Introductionmentioning
confidence: 99%