Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-1485
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Task Learning and Weighted Cross-Entropy for DNN-Based Keyword Spotting

Abstract: We propose improved Deep Neural Network (DNN) training loss functions for more accurate single keyword spotting on resource-constrained embedded devices. The loss function modifications consist of a combination of multi-task training and weighted cross entropy. In the multi-task architecture, the keyword DNN acoustic model is trained with two tasks in parallel-the main task of predicting the keyword-specific phone states, and an auxiliary task of predicting LVCSR senones. We show that multi-task learning leads… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
99
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
3
3
3

Relationship

1
8

Authors

Journals

citations
Cited by 137 publications
(99 citation statements)
references
References 19 publications
0
99
0
Order By: Relevance
“…The DNN had a binary output indicating the presence or absence of the keyword (e.g., "Alexa") at the middle frame (i.e., most targets are background except for the ≈70 frame targets from each keyword). We found that adding a simultaneous additional ASR label task improved performance (see [20] for details). So, we used this in all experiments except those with knowledge distillation, where it was unnecessary as long as the teacher model was trained with the multi-task objective.…”
Section: Small-footprint Abstractpottingmentioning
confidence: 89%
See 1 more Smart Citation
“…The DNN had a binary output indicating the presence or absence of the keyword (e.g., "Alexa") at the middle frame (i.e., most targets are background except for the ≈70 frame targets from each keyword). We found that adding a simultaneous additional ASR label task improved performance (see [20] for details). So, we used this in all experiments except those with knowledge distillation, where it was unnecessary as long as the teacher model was trained with the multi-task objective.…”
Section: Small-footprint Abstractpottingmentioning
confidence: 89%
“…Notably, the DNN is trained to optimize a framewise cross-entropy loss, whereas the detection task is truly a sequence level task, however, the two are highly correlated. We trained the DNN using distributed asynchronous SGD [19,20]. We used a performance based learning rate schedule (similar to "newbob"), where the learning rate is halved every time performance degrades on the development set.…”
Section: Small-footprint Abstractpottingmentioning
confidence: 99%
“…We trained the classifier CNN using the ADAM 39 optimizer with a cross-entropy loss and L2 weight decay with a value of 5 × 10 −5 and a learning rate of 10 −3 . To account for the imbalance of our training set ("good probe" 25% and "bad probe" 75%), we weighed STM images labeled as "good probe" by a factor of 8 when computing the loss 40 . In addition, we increased the available amount of training data via data augmentation, randomly flipping the input SPM images horizontally or vertically.…”
Section: Methodsmentioning
confidence: 99%
“…Following the successes in general ASR [2,3], the neural network based approach has been extensively explored in keyword spotting area with benefits of lowering resource requirements and improving accuracy [4,5,6,7,8,9,10,11]. Such works include DNN + temporal integration [4,5,11,12], and HMM + DNN hybrid approaches [6,7,8,9,10]. Recently introduced end-to-end trainable DNN approaches [1,13] further improved accuracy and lowered resource requirements using highly optimizable system design.…”
Section: Introductionmentioning
confidence: 99%