ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414505
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic Sparsity Neural Networks for Automatic Speech Recognition

Abstract: In automatic speech recognition (ASR), model pruning is a widely adopted technique that reduces model size and latency to deploy neural network models on edge devices with resource constraints. However, multiple models with different sparsity levels usually need to be separately trained and deployed to heterogeneous target hardware with different resource specifications and for applications that have various latency requirements. In this paper, we present Dynamic Sparsity Neural Networks (DSNN) that, once trai… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
14
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 26 publications
(14 citation statements)
references
References 31 publications
(68 reference statements)
0
14
0
Order By: Relevance
“…Dynamic Sparsity ASR: In [36], dynamic sparsity neural networks (DSNN) are proposed, which, once trained, can be operated at different sparsity levels. There, the introduction of our distillation loss for in-place distillation from non-sparse to sparse models was found to yield WER reductions of 0.1-0.4 absolute on the Search test set for models with different levels of sparsity, with a WERR of 5% for a 70% sparse model.…”
Section: Discussion and Future Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Dynamic Sparsity ASR: In [36], dynamic sparsity neural networks (DSNN) are proposed, which, once trained, can be operated at different sparsity levels. There, the introduction of our distillation loss for in-place distillation from non-sparse to sparse models was found to yield WER reductions of 0.1-0.4 absolute on the Search test set for models with different levels of sparsity, with a WERR of 5% for a 70% sparse model.…”
Section: Discussion and Future Workmentioning
confidence: 99%
“…For a small Conformer model trained on LibriSpeech data, the introduction of the distillation loss yielded a 4.8% relative WER reduction on the test-other dataset. The proposed distillation loss has been incorporated successfully in other recent work, yielding WER improvements in dynamic-sparsity neural networks [36], and yielding significant improvements in both WER and latency for the streaming-mode "Universal ASR" model [34].…”
Section: Discussionmentioning
confidence: 99%
“…The majority of these existing approaches have focused on the earlier ASR systems instead of the Deep Neural Network (DNN) based models. Although model pruning has been explored for self-supervised and other ASR models (Lai et al, 2021;Wu et al, 2021;Zhen et al, 2021) data subset selection for fine-tuning self-supervised ASR systems has only been explored in the context of personalization for accented speakers (Awasthi et al, 2021). A phoneme-level error model is proposed which selects sentences that yield a lower test WER as compared to random sentence selection.…”
Section: Related Workmentioning
confidence: 99%
“…Yu et al [25] used this loss function for training encoder modules capable of working in both streaming and full-context speech recognition scenarios. Wu et al [27] applied the sequence level KD to train models of different sparsity levels. Their method results in unstructured sparsity that needs specialized implementation to arXiv:2106.08960v1 [cs.CL] 16 Jun 2021 fully exploit the computational benefit of sparsity.…”
Section: Related Workmentioning
confidence: 99%
“…The second category can be further divided based on whether the methods allow an external agent to control the resources during inference. Some methods [32,33,27] can adjust the forward pass pathway in the network to reduce the computation based on the input. Another set of methods called anytime inference [34,35] allow an external agent to stop the computation at any point and get the best possible prediction.…”
Section: Related Workmentioning
confidence: 99%