2019
DOI: 10.48550/arxiv.1901.09208
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Intrinsically Sparse Long Short-Term Memory Networks

Shiwei Liu,
Decebal Constantin Mocanu,
Mykola Pechenizkiy

Abstract: Long Short-Term Memory (LSTM) has achieved state-of-the-art performances on a wide range of tasks. Its outstanding performance is guaranteed by the long-term memory ability which matches the sequential data perfectly and the gating structure controlling the information flow. However, LSTMs are prone to be memory-bandwidth limited in realistic applications and need an unbearable period of training and inference time as the model size is ever-increasing. To tackle this problem, various efficient model compressio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
2

Relationship

2
0

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 15 publications
0
3
0
Order By: Relevance
“…For RNN layers, we use cell gate redistribution to update their sparse connectivity. The naive approach is to sparsify all cell gates independently at the same sparsity, as used in Liu et al (2019) which is a straightforward SET extension to RNNs. Essentially, it is more desirable to redistribute new weights to cell gates dependently, as all gates collaborate together to regulate information.…”
Section: Dynamic Sparse Connectivitymentioning
confidence: 99%
“…For RNN layers, we use cell gate redistribution to update their sparse connectivity. The naive approach is to sparsify all cell gates independently at the same sparsity, as used in Liu et al (2019) which is a straightforward SET extension to RNNs. Essentially, it is more desirable to redistribute new weights to cell gates dependently, as all gates collaborate together to regulate information.…”
Section: Dynamic Sparse Connectivitymentioning
confidence: 99%
“…Very recently, instead of using the momentum, The Rigged Lottery [5] grows the zero-weights with the highest magnitude gradients to eliminate the extra floating point operations required by Sparse Momentum. Liu et al [20] trained intrinsically sparse recurrent neural networks (RNNs) that can achieve usually better performance than dense models. Lee et al [17] introduced single-shot network pruning (SNIP) that can discover a sparse network before training based on a connection sensitivity criterion.…”
Section: Sparse Neural Network For Training Efficiencymentioning
confidence: 99%
“…Recently, many works have emerged to achieve both, training efficiency and inference efficiency, based on adaptive sparse connectivity [26,28,20,3,5]. Such networks are initialized with a sparse topology and can maintain a fixed sparsity level throughout training.…”
Section: Introductionmentioning
confidence: 99%