Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1676
|View full text |Cite
|
Sign up to set email alerts
|

A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting

Abstract: Keyword spotting requires a small memory footprint to run on mobile devices. However, previous works still use several hundred thousand parameters to achieve good performance. To address this issue, we propose a time delay neural network with shared weight self-attention for small-footprint keyword spotting. By sharing weights, the parameters of self-attention are reduced but without performance reduction. The publicly available Google Speech Commands dataset is used to evaluate the models. The number of param… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
24
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 21 publications
(24 citation statements)
references
References 14 publications
0
24
0
Order By: Relevance
“…As the performance measure, the equal error rate (EER) and a false rejection rate (FRR) at a 1.0% false alarm rate (FAR) were used, which have been popularly used in literature, including [4][5][6][10][11][12]. The FRR and FAR represented the probability of falsely rejecting the WUW inputs and falsely accepting the non-WUW inputs, respectively.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…As the performance measure, the equal error rate (EER) and a false rejection rate (FRR) at a 1.0% false alarm rate (FAR) were used, which have been popularly used in literature, including [4][5][6][10][11][12]. The FRR and FAR represented the probability of falsely rejecting the WUW inputs and falsely accepting the non-WUW inputs, respectively.…”
Section: Resultsmentioning
confidence: 99%
“…In earlier research, a support vector machine (SVM) was used for the WUW recognition system [1]. Because the performance of deep neural network (DNN) systems has proven to be highly effective in many fields, there have been numerous efforts to build DNN-based WUW recognizers in various ways [2][3][4][5][6][7][8][9][10][11][12]. In [2], the bidirectional long short-term memory (BLSTM)-based end-to-end model was used to calculate the post-probability similar to the hybrid system, and the weighted finite-state transducers (WFSTs) were used to generate a confidence score from the calculated post-probability.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In this paradigm (being new at the time), the sequence of word posterior probabilities yielded by a DNN is directly processed to determine the possible existence of keywords without the intervention of any HMM (see Figure 2). The deep KWS paradigm has recently attracted much attention [16], [26] due to a threefold reason:…”
Section: Introductionmentioning
confidence: 99%
“…The computations are more parallelized, in the sense that the processing of a frame does not depend on the completion of processing other frames in the same layer. [15] also explored the self-attention in the keyword search (KWS) task. However, the original self-attention requires the entire input sequence to be available before any frames can be processed, and the computational complexity and memory usage are both O(T 2 ).…”
Section: Introductionmentioning
confidence: 99%