Compression of recurrent neural networks for efficient language modeling

Grachev, Artem M.; Ignatov, Dmitry I.; Savchenko, Andrey V.

doi:10.1016/j.asoc.2019.03.057

Cited by 30 publications

(22 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We also compared our approach to other compression techniques: matrix decompositionbased (Grachev et al, 2019) and VDbased (Chirkova et al, 2018). For the last one we used a similar model: a network with one LSTM layer of 256 hidden units.…”

Section: Methodsmentioning

confidence: 99%

“…The weight gradually increases from zero to one during the first several epochs of training. This technique allows achieving better final performance of the model because such a train- 13.4 / 0.14 32.1% / 97.8% 91.84 27.2% LR for Softmax, (Grachev et al, 2019) 14.5 / 1.19 26.8 % / 81.7 % 84.12 N/A TT for Softmax, (Grachev et al, 2019) 14. (Chirkova et al, 2018) 3.2 / 0.12 43.3 % / 95.5 % 109.2 N/A DSVI-ARD (Ours)…”

Section: Training and Evaluationmentioning

confidence: 99%

“…Most of the works on neural networks compression can be roughly divided into two categories: those dealing with matrix decomposition approaches (Lu et al, 2016;Arjovsky et al, 2016;Tjandra et al, 2017;Grachev et al, 2019) and those that leverage pruning techniques (Han et al, 2015;Narang et al, 2017). From this point of view methods based on Bayesian techniques (Louizos et al, 2017;Molchanov et al, 2017) can be considered as a more mathematically justified version of pruning.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks

Kodryan

Grachev

Ignatov

et al. 2019

Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

View full text Add to dashboard Cite

Reduction of the number of parameters is one of the most important goals in Deep Learning. In this article we propose an adaptation of Doubly Stochastic Variational Inference for Automatic Relevance Determination (DSVI-ARD) for neural networks compression. We find this method to be especially useful in language modeling tasks, where large number of parameters in the input and output layers is often excessive. We also show that DSVI-ARD can be applied together with encoder-decoder weight tying allowing to achieve even better sparsity and performance. Our experiments demonstrate that more than 90% of the weights in both encoder and decoder layers can be removed with a minimal quality loss. * These two authors contributed equally; the ordering of their names was chosen arbitrarily. The work was done when the first author was an intern at the Samsung R&D Institute.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Training and Evaluationmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks

Kodryan

Grachev

Ignatov

et al. 2019

Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

View full text Add to dashboard Cite

show abstract

“…Our study concentrated on the ability to estimate dose in heterogeneous geometries, and no effort was made in improving the model efficiency. Various model compression techniques, for example, pruning, quantization, and tensor decomposition methods (achieving low-rank structures in the weight matrices), [51][52][53] may substantially lower the number of parameters in fully connected layers. 54,55 The efficiency of the model can be further enhanced through fine-tuning of the model architecture.…”

Section: In This Paper We Have Demonstrated the General Feasibility mentioning

confidence: 99%

Long short‐term memory networks for proton dose calculation in highly heterogeneous tissues

et al. 2021

View full text Add to dashboard Cite

To investigate the feasibility and accuracy of proton dose calculations with artificial neural networks (ANNs) in challenging three-dimensional (3D) anatomies. Methods: A novel proton dose calculation approach was designed based on the application of a long short-term memory (LSTM) network. It processes the 3D geometry as a sequence of two-dimensional (2D) computed tomography slices and outputs a corresponding sequence of 2D slices that forms the 3D dose distribution. The general accuracy of the approach is investigated in comparison to Monte Carlo reference simulations and pencil beam dose calculations. We consider both artificial phantom geometries and clinically realistic lung cases for three different pencil beam energies. Results: For artificial phantom cases, the trained LSTM model achieved a 98.57% γ-index pass rate ([1%, 3 mm]) in comparison to MC simulations for a pencil beam with initial energy 104.25 MeV. For a lung patient case, we observe pass rates of 98.56%, 97.74%, and 94.51% for an initial energy of 67.85, 104.25, and 134.68 MeV, respectively. Applying the LSTM dose calculation on patient cases that were fully excluded from the training process yields an average γ-index pass rate of 97.85%. Conclusions: LSTM networks are well suited for proton dose calculation tasks. Further research, especially regarding model generalization and computational performance in comparison to established dose calculation methods, is warranted.

show abstract

“…-Low rank factorization: [10,36] -Factorized embedding parameterization: [19] -Block-Term tensor decomposition: [23,38] -Singular Value Decomposition: [37] -Joint factorization of recurrent and inter-layer weight matrices: [28] -Tensor train decomposition: [10,17] -Sparse factorization: [6] • [11] • Applications: In this section, we will discuss application and success of various model compression methods across various popular NLP tasks like Language modeling, Machine translation, Summarization, Sentiment analysis, Question answering, Natural language inference, Paraphrasing, Image captioning, Handwritten character recognition. • Summary and future trends.…”

Section: Tutorial Outlinementioning

confidence: 99%

Compression of Deep Learning Models for NLP

Gupta

Varma

Damani

et al. 2020

Proceedings of the 29th ACM International Conference on Information &Amp; Knowledge Management

View full text Add to dashboard Cite

In recent years, the fields of NLP and information retrieval have made tremendous progress thanks to deep learning models like RNNs and LSTMs, and Transformer [35] based models like BERT [9]. But these models are humongous in size. Real world applications however demand small model size, low response times and low computational power wattage. We will discuss six different types of methods (pruning, quantization, knowledge distillation, parameter sharing, matrix decomposition, and other Transformer based methods) for compression of such models to enable their deployment in real industry NLP projects. Given the critical need of building applications with efficient and small models, and the large amount of recently published work in this area, we believe that this tutorial is very timely. We will organize related work done by the 'deep learning for NLP' community in the past few years and present it as a coherent story. CCS CONCEPTS • Computing methodologies → Neural networks; Machine learning; Natural language processing; • Theory of computation → Models of learning.

show abstract

Compression of recurrent neural networks for efficient language modeling

Cited by 30 publications

References 18 publications

Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks

Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks

Long short‐term memory networks for proton dose calculation in highly heterogeneous tissues

Compression of Deep Learning Models for NLP

Contact Info

Product

Resources

About