Compressing recurrent neural network with tensor train

Tjandra, Andros; Sakti, Sakriani; Nakamura, Satoshi

doi:10.1109/ijcnn.2017.7966420

Cited by 86 publications

(62 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Different matrix decomposition techniques can be also related to this class of compression methods. These methods can be as simple as low-rank decomposition or more complex like Tensor Train (TT) decomposition [24,25,26,27]. However, the TT-based approach have not been studied in language modeling task, where there are such issues as highdimensional input and output data, and, as a consequence, more options to configure TT decomposition.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Compression of recurrent neural networks for efficient language modeling

Grachev

Ignatov

Savchenko

2019

Applied Soft Computing

View full text Add to dashboard Cite

Recurrent neural networks have proved to be an effective method for statistical language modeling. However, in practice their memory and run-time complexity are usually too large to be implemented in real-time offline mobile applications. In this paper we consider several compression techniques for recurrent neural networks including Long-Short Term Memory models. We make particular attention to the high-dimensional output problem caused by the very large vocabulary size. We focus on effective compression methods in the context of their exploitation on devices: pruning, quantization, and matrix decomposition approaches (low-rank factorization and tensor train decomposition, in particular). For each model we investigate the trade-off between its size, suitability for fast inference and perplexity. We propose a general pipeline for applying the most suitable methods to compress recurrent neural networks for language modeling. It has been shown in the experimental study with the Penn Treebank (PTB) dataset that the most efficient results in terms of speed and compression-perplexity balance are obtained by matrix decomposition techniques.

show abstract

Section: Related Workmentioning

confidence: 99%

“…This approach was successfully applied to compress fully connected neural networks [24], to develop convolutional TT layer [25] and to compress and improve RNNs [26,27]. However, there are still no studies of the TT decomposition for language modeling and similar tasks with high-dimensional outputs at the softmax layer.…”

Section: Tensor Train Decompositionmentioning

confidence: 99%

Compression of recurrent neural networks for efficient language modeling

Grachev

Ignatov

Savchenko

2019

Applied Soft Computing

View full text Add to dashboard Cite

show abstract

“…A common challenge of the above technique is to determine the tensor rank. Exactly determining a tensor rank in general [49]- [51] Tensor Train [49] CP [55] Tucker [53], [55], [57] Tensor Train [52], [54], [55] CP [48] Tucker Tensor Train is NP-hard [47]. Therefore, in practice one often leverages numerical optimization or statistical techniques to obtain a reasonable rank estimation.…”

Section: Compact Deep Learning Modelsmentioning

confidence: 99%

“…In order to avoid the expensive pre-training in the uncompressed format, the work in [52] and [53] directly trained fully connected and convolution layers in low-rank tensor-train and Tucker format with the tensor ranks fixed in advance. This idea has also been applied to recurrent neural networks [54], [55].…”

Section: Tensorized Training With a Fixed Rankmentioning

confidence: 99%

Tensor Methods for Generating Compact Uncertainty Quantification and Deep Learning Models

Cui

Hawkins

Zhang

2019

2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

View full text Add to dashboard Cite

Tensor methods have become a promising tool to solve high-dimensional problems in the big data era. By exploiting possible low-rank tensor factorization, many high-dimensional model-based or data-driven problems can be solved to facilitate decision making or machine learning. In this paper, we summarize the recent applications of tensor computation in obtaining compact models for uncertainty quantification and deep learning. In uncertainty analysis where obtaining data samples is expensive, we show how tensor methods can significantly reduce the simulation or measurement cost. To enable the deployment of deep learning on resource-constrained hardware platforms, tensor methods can be used to significantly compress an overparameterized neural network model or directly train a smallsize model from scratch via optimization or statistical techniques. Recent Bayesian tensorized neural networks can automatically determine their tensor ranks in the training process.

show abstract

“…There are a lot of RNNs compression methods based on specific weight matrix representations (Tjandra et al, 2017;Le et al, 2015) or sparsification (Narang et al, 2017;Wen et al, 2018). In this paper we focus on RNNs compression via sparsification.…”

Section: Introductionmentioning

confidence: 99%

Bayesian Compression for Natural Language Processing

Chirkova

Lobacheva

Vetrov

2018

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

In natural language processing, a lot of the tasks are successfully solved with recurrent neural networks, but such models have a huge number of parameters. The majority of these parameters are often concentrated in the embedding layer, which size grows proportionally to the vocabulary length. We propose a Bayesian sparsification technique for RNNs which allows compressing the RNN dozens or hundreds of times without time-consuming hyperparameters tuning. We also generalize the model for vocabulary sparsification to filter out unnecessary words and compress the RNN even further. We show that the choice of the kept words is interpretable.

show abstract

Compressing recurrent neural network with tensor train

Cited by 86 publications

References 23 publications

Compression of recurrent neural networks for efficient language modeling

Compression of recurrent neural networks for efficient language modeling

Tensor Methods for Generating Compact Uncertainty Quantification and Deep Learning Models

Bayesian Compression for Natural Language Processing

Contact Info

Product

Resources

About