ShrinkML: End-to-End ASR Model Compression Using Reinforcement Learning

Dudziak, Łukasz; Abdelfattah, Mohamed S.; Vipperla, Ravichander; Laskaridis, Stefanos; Lane, Nicholas D.

doi:10.21437/interspeech.2019-2811

Cited by 20 publications

(14 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this manner, we remove both the excessive overhead of fine-tuning and the need for labelled data availability, which is crucial for real-world, privacy-aware applications (Wainwright et al, 2012;Shokri & Shmatikov, 2015). Finally, other model compression methods (Fang et al, 2018;Wang et al, 2019a;Dudziak et al, 2019) remain orthogonal to FjORD. System Heterogeneity.…”

Section: Related Workmentioning

confidence: 99%

FjORD: Fair and Accurate Federated Learning under heterogeneous targets with Ordered Dropout

Horváth¹,

Laskaridis²,

Almeida³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Federated Learning (FL) has been gaining significant traction across different ML tasks, ranging from vision to keyboard predictions. In largescale deployments, client heterogeneity is a fact, and constitutes a primary problem for fairness, training performance and accuracy. Although significant efforts have been made into tackling statistical data heterogeneity, the diversity in the processing capabilities and network bandwidth of clients, termed as system heterogeneity, has remained largely unexplored. Current solutions either disregard a large portion of available devices or set a uniform limit on the model's capacity, restricted by the least capable participants. In this work, we introduce Ordered Dropout, a mechanism that achieves an ordered, nested representation of knowledge in Neural Networks and enables the extraction of lower footprint submodels without the need of retraining. We further show that for linear maps our Ordered Dropout is equivalent to SVD. We employ this technique, along with a self-distillation methodology, in the realm of FL in a framework called FjORD. FjORD alleviates the problem of client system heterogeneity by tailoring the model width to the client's capabilities. Extensive evaluation on both CNNs and RNNs across diverse modalities shows that FjORD consistently leads to significant performance gains over state-of-the-art baselines, while maintaining its nested structure.

show abstract

Section: Related Workmentioning

confidence: 99%

FjORD: Fair and Accurate Federated Learning under heterogeneous targets with Ordered Dropout

Horváth¹,

Laskaridis²,

Almeida³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…NAS aims at automatically designing neural network architectures [15][16][17][18]. For example, He et al [17] proposed the AutoML for Model Compression (AMC) method and introduced reinforcement learning to learn the optimal parameters of pruning; Dudziak et al [18] used reinforcement learning to select the per-layer compression ratios based on matrix approximation. However, such a method requires massive computation during the search.…”

Section: Related Workmentioning

confidence: 99%

Mutual-learning sequence-level knowledge distillation for automatic speech recognition

Ming

Lei

et al. 2021

Neurocomputing

View full text Add to dashboard Cite

“…Neural Architecture Search for Super-resolution. Recent SR works aim to build more efficient models using NAS, which has been vastly successful in a wide range of tasks such as image classification [57,55,38], language modeling [56], and automatic speech recognition [12]. We mainly focus on previous works that adopt NAS for SR and refer the reader to Elsken et al [13] for a detailed survey on NAS.…”

Section: Background and Related Workmentioning

confidence: 99%