Intrinsically Sparse Long Short-Term Memory Networks

Liu, Shiwei; Mocanu, Decebal Constantin; Pechenizkiy, Mykola

doi:10.48550/arxiv.1901.09208

Cited by 2 publications

(3 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For RNN layers, we use cell gate redistribution to update their sparse connectivity. The naive approach is to sparsify all cell gates independently at the same sparsity, as used in Liu et al (2019) which is a straightforward SET extension to RNNs. Essentially, it is more desirable to redistribute new weights to cell gates dependently, as all gates collaborate together to regulate information.…”

Section: Dynamic Sparse Connectivitymentioning

confidence: 99%

Selfish Sparse RNN Training

Liu¹,

Mocanu²,

Pei³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Sparse neural networks have been widely applied to reduce the necessary resource requirements to train and deploy over-parameterized deep neural networks. For inference acceleration, methods that induce sparsity from a pre-trained dense network (dense-to-sparse training) work effectively. Recently, dynamic sparse training (DST) has been proposed to train sparse neural networks without pre-training a dense network (sparse-to-sparse training), so that the training process can also be accelerated. However, previous sparse-to-sparse methods mainly focus on Multilayer Perceptron Networks (MLPs) and Convolutional Neural Networks (CNNs), failing to match the performance of dense-to-sparse methods in Recurrent Neural Networks (RNNs) setting. In this paper, we propose an approach to train sparse RNNs with a fixed parameter count in one single run, without compromising performance. During training, we allow RNN layers to have a non-uniform redistribution across cell gates for a better regularization. Further, we introduce SNT-ASGD, a variant of the averaged stochastic gradient optimizer, which significantly improves the performance of all sparse training methods for RNNs. Using these strategies, we achieve state-of-the-art sparse training results, better than the dense-to-sparse methods, with various types of RNNs on Penn TreeBank and Wikitext-2 datasets.

show abstract

Section: Dynamic Sparse Connectivitymentioning

confidence: 99%

Selfish Sparse RNN Training

Liu¹,

Mocanu²,

Pei³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Very recently, instead of using the momentum, The Rigged Lottery [5] grows the zero-weights with the highest magnitude gradients to eliminate the extra floating point operations required by Sparse Momentum. Liu et al [20] trained intrinsically sparse recurrent neural networks (RNNs) that can achieve usually better performance than dense models. Lee et al [17] introduced single-shot network pruning (SNIP) that can discover a sparse network before training based on a connection sensitivity criterion.…”

Section: Sparse Neural Network For Training Efficiencymentioning

confidence: 99%

“…Recently, many works have emerged to achieve both, training efficiency and inference efficiency, based on adaptive sparse connectivity [26,28,20,3,5]. Such networks are initialized with a sparse topology and can maintain a fixed sparsity level throughout training.…”

Section: Introductionmentioning

confidence: 99%

Topological Insights into Sparse Neural Networks

Liu

Lee

Yaman

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Sparse neural networks are effective approaches to reduce the resource requirements for the deployment of deep neural networks. Recently, the concept of adaptive sparse connectivity, has emerged to allow training sparse neural networks from scratch by optimizing the sparse structure during training. However, comparing different sparse topologies and determining how sparse topologies evolve during training, especially for the situation in which the sparse structure optimization is involved, remain as challenging open questions. This comparison becomes increasingly complex as the number of possible topological comparisons increases exponentially with the size of networks. In this work, we introduce an approach to understand and compare sparse neural network topologies from the perspective of graph theory. We first propose Neural Network Sparse Topology Distance (NNSTD) to measure the distance between different sparse neural networks. Further, we demonstrate that sparse neural networks can outperform over-parameterized models in terms of performance, even without any further structure optimization. To the end, we also show that adaptive sparse connectivity can always unveil a plenitude of sparse sub-networks with very different topologies which outperform the dense model, by quantifying and comparing their topological evolutionary processes. The latter findings complement the Lottery Ticket Hypothesis by showing that there is a much more efficient and robust way to find "winning tickets". Altogether, our results start enabling a better theoretical understanding of sparse neural networks, and demonstrate the utility of using graph theory to analyze them.

show abstract

Intrinsically Sparse Long Short-Term Memory Networks

Cited by 2 publications

References 15 publications

Selfish Sparse RNN Training

Selfish Sparse RNN Training

Topological Insights into Sparse Neural Networks

Contact Info

Product

Resources

About