Multi-Task Learning and Weighted Cross-Entropy for DNN-Based Keyword Spotting

Panchapagesan, Sankaran; Sun, Ming; Khare, Aparna; Matsoukas, Spyros; Mandal, Arindam; Hoffmeister, Björn; Vitaladevuni, Shiv

doi:10.21437/interspeech.2016-1485

Cited by 137 publications

(99 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The DNN had a binary output indicating the presence or absence of the keyword (e.g., "Alexa") at the middle frame (i.e., most targets are background except for the ≈70 frame targets from each keyword). We found that adding a simultaneous additional ASR label task improved performance (see [20] for details). So, we used this in all experiments except those with knowledge distillation, where it was unnecessary as long as the teacher model was trained with the multi-task objective.…”

Section: Small-footprint Abstractpottingmentioning

confidence: 89%

“…Notably, the DNN is trained to optimize a framewise cross-entropy loss, whereas the detection task is truly a sequence level task, however, the two are highly correlated. We trained the DNN using distributed asynchronous SGD [19,20]. We used a performance based learning rate schedule (similar to "newbob"), where the learning rate is halved every time performance degrades on the development set.…”

Section: Small-footprint Abstractpottingmentioning

confidence: 99%

See 1 more Smart Citation

Model Compression Applied to Small-Footprint Keyword Spotting

Tucker¹,

Wu²,

Sun³

et al. 2016

Interspeech 2016

Self Cite

View full text Add to dashboard Cite

Several consumer speech devices feature voice interfaces that perform on-device keyword spotting to initiate user interactions. Accurate on-device keyword spotting within a tight CPU budget is crucial for such devices. Motivated by this, we investigated two ways to improve deep neural network (DNN) acoustic models for keyword spotting without increasing CPU usage. First, we used low-rank weight matrices throughout the DNN. This allowed us to increase representational power by increasing the number of hidden nodes per layer without changing the total number of multiplications. Second, we used knowledge distilled from an ensemble of much larger DNNs used only during training. We systematically evaluated these two approaches on a massive corpus of far-field utterances. Alone both techniques improve performance and together they combine to give significant reductions in false alarms and misses without increasing CPU or memory usage.

show abstract

Section: Small-footprint Abstractpottingmentioning

confidence: 89%

Section: Small-footprint Abstractpottingmentioning

confidence: 99%

Model Compression Applied to Small-Footprint Keyword Spotting

Tucker¹,

Wu²,

Sun³

et al. 2016

Interspeech 2016

Self Cite

View full text Add to dashboard Cite

show abstract

“…We trained the classifier CNN using the ADAM 39 optimizer with a cross-entropy loss and L2 weight decay with a value of 5 × 10 −5 and a learning rate of 10 −3 . To account for the imbalance of our training set ("good probe" 25% and "bad probe" 75%), we weighed STM images labeled as "good probe" by a factor of 8 when computing the loss 40 . In addition, we increased the available amount of training data via data augmentation, randomly flipping the input SPM images horizontally or vertically.…”

Section: Methodsmentioning

confidence: 99%

Artificial-intelligence-driven scanning probe microscopy

et al. 2020

View full text Add to dashboard Cite

Scanning probe microscopy (SPM) has revolutionized the fields of materials, nano-science, chemistry, and biology, by enabling mapping of surface properties and surface manipulation with atomic precision. However, these achievements require constant human supervision; fully automated SPM has not been accomplished yet. Here we demonstrate an artificial intelligence framework based on machine learning for autonomous SPM operation (DeepSPM). DeepSPM includes an algorithmic search of good sample regions, a convolutional neural network to assess the quality of acquired images, and a deep reinforcement learning agent to reliably condition the state of the probe. DeepSPM is able to acquire and classify data continuously in multi-day scanning tunneling microscopy experiments, managing the probe quality in response to varying experimental conditions. Our approach paves the way for advanced methods hardly feasible by human operation (e.g., large dataset acquisition and SPM-based nanolithography). DeepSPM can be generalized to most SPM techniques, with the source code publicly available.

show abstract

“…Following the successes in general ASR [2,3], the neural network based approach has been extensively explored in keyword spotting area with benefits of lowering resource requirements and improving accuracy [4,5,6,7,8,9,10,11]. Such works include DNN + temporal integration [4,5,11,12], and HMM + DNN hybrid approaches [6,7,8,9,10]. Recently introduced end-to-end trainable DNN approaches [1,13] further improved accuracy and lowered resource requirements using highly optimizable system design.…”

Section: Introductionmentioning

confidence: 99%

Learning to Detect Keyword Parts and Whole by Smoothed Max Pooling

Park

Violette

Subrahmanya

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

We propose smoothed max pooling loss and its application to keyword spotting systems. The proposed approach jointly trains an encoder (to detect keyword parts) and a decoder (to detect whole keyword) in a semi-supervised manner. The proposed new loss function allows training a model to detect parts and whole of a keyword, without strictly depending on frame-level labeling from LVCSR (Large vocabulary continuous speech recognition), making further optimization possible. The proposed system outperforms the baseline keyword spotting model in [1] due to increased optimizability. Further, it can be more easily adapted for on-device learning applications due to reduced dependency on LVCSR.

show abstract

Multi-Task Learning and Weighted Cross-Entropy for DNN-Based Keyword Spotting

Cited by 137 publications

References 19 publications

Model Compression Applied to Small-Footprint Keyword Spotting

Model Compression Applied to Small-Footprint Keyword Spotting

Artificial-intelligence-driven scanning probe microscopy

Learning to Detect Keyword Parts and Whole by Smoothed Max Pooling

Contact Info

Product

Resources

About