Recurrent neural networks (RNNs) have recently achieved remarkable successes in a number of applications. However, the huge sizes and computational burden of these models make it difficult for their deployment on edge devices. A practically effective approach is to reduce the overall storage and computation costs of RNNs by network pruning techniques. Despite their successful applications, those pruning methods based on Lasso either produce irregular sparse patterns in weight matrices, which is not helpful in practical speedup. To address these issues, we propose structured pruning method through neuron selection which can reduce the sizes of basic structures of RNNs. More specifically, we introduce two sets of binary random variables, which can be interpreted as gates or switches to the input neurons and the hidden neurons, respectively. We demonstrate that the corresponding optimization problem can be addressed by minimizing the L0 norm of the weight matrix. Finally, experimental results on language modeling and machine reading comprehension tasks have indicated the advantages of the proposed method in comparison with state-of-the-art pruning competitors. In particular, nearly 20× practical speedup during inference was achieved without losing performance for language model on the Penn TreeBank Models, Model Compression IntroductionRecurrent neural networks (RNNs) have recently achieved remarkable successes in multiple fields such as image captioning [1,2], action recognition [3,4], question answering [5,6], machine translation [7,8,9], and language modelling [10,11,12]. These successes heavily rely on huge models trained on large datasets, especially for those RNN variants such as Long Short Term Memory (LSTM) networks [13] and Gated Recurrent Unit (GRU) networks [14]. With the increasing popularity of edge computing, a recent trend is to deploy these models onto end devices so as to allow off-line reasoning and inference. However, these models are generally of huge sizes and bring expensive computation and storage costs during inference, which makes the deployment difficult for those devices with limited resources. In order to reduce the overall computation and storage costs of these models, model compression on recurrent neural networks has been widely concerned.Network pruning is one of the prominent approaches to tackle the compression of RNNs.[15] present a connection pruning method to compress RNNs efficiently. However, the obtained weight matrix via connection pruning have random, unstructured sparsity. Such unstructured sparse formats is unfriendly for efficient computation in modern hardware systems [16] due to irregular memory access in modern processors. Previous studies [17,18] have shown that speedup obtained with random sparse matrix multiplication on various hardware platforms are lower than expected. For example, varying the sparsity level in weight matrices of AlexNet in the range of 67.6%, 92.4%, 94.3%, 96.6%, and 97.2%, the speedup ratio was 0.25x, 0.52x, 1.36×, 1.04x, and 1.38x, respectively...
In this paper, we investigate a new variant of neural architecture search (NAS) paradigm -searching with random labels (RLNAS). The task sounds counter-intuitive for most existing NAS algorithms since random label provides few information on the performance of each candidate architecture. Instead, we propose a novel NAS framework based on ease-of-convergence hypothesis, which requires only random labels during searching. The algorithm involves two steps: first, we train a SuperNet using random labels; second, from the SuperNet we extract the subnetwork whose weights change most significantly during the training. Extensive experiments are evaluated on multiple datasets (e.g. NAS-Bench-201 and ImageNet) and multiple search spaces (e.g. DARTS-like and MobileNet-like). Very surprisingly, RLNAS achieves comparable or even better results compared with state-of-the-art NAS methods such as PC-DARTS, Single Path One-Shot, even though the counterparts utilize full ground truth labels for searching. We hope our finding could inspire new understandings on the essential of NAS.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.