Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019) 2019
DOI: 10.18653/v1/w19-4306
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks

Abstract: Reduction of the number of parameters is one of the most important goals in Deep Learning. In this article we propose an adaptation of Doubly Stochastic Variational Inference for Automatic Relevance Determination (DSVI-ARD) for neural networks compression. We find this method to be especially useful in language modeling tasks, where large number of parameters in the input and output layers is often excessive. We also show that DSVI-ARD can be applied together with encoder-decoder weight tying allowing to achie… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 16 publications
0
2
0
Order By: Relevance
“…Earlier works have focused on inducing sparsity in standard feed-forward neural networks. Yet, Bayesian pruning methods have also been successfully applied to recurrent neural networks (RNNs) [Kodryan et al 2019;Lobacheva et al 2018]. Lobacheva et al [2018] use Sparse VD to prune individual weights of an LSTM or follow the approach from Louizos et al [2017] to sparsify neurons or gates and show results on text classification or language modeling problems.…”
Section: Variational Selection Schemesmentioning
confidence: 99%
See 1 more Smart Citation
“…Earlier works have focused on inducing sparsity in standard feed-forward neural networks. Yet, Bayesian pruning methods have also been successfully applied to recurrent neural networks (RNNs) [Kodryan et al 2019;Lobacheva et al 2018]. Lobacheva et al [2018] use Sparse VD to prune individual weights of an LSTM or follow the approach from Louizos et al [2017] to sparsify neurons or gates and show results on text classification or language modeling problems.…”
Section: Variational Selection Schemesmentioning
confidence: 99%
“…Lobacheva et al [2018] use Sparse VD to prune individual weights of an LSTM or follow the approach from Louizos et al [2017] to sparsify neurons or gates and show results on text classification or language modeling problems. Kodryan et al [2019] use instead the Automatic Relevance Determination (ARD) framework, in which a zero-mean element-wise factorized Gaussian prior distribution over the parameters is used, together with a corresponding Gaussian factorized posterior, such that a closed-form expression of the KL divergence term of the variational lower bound is obtained. Subsequently, the Doubly Stochastic Variational Inference (DSVI) method is used to maximize the variational lower bound and the weights for which the prior variances are lower than a certain threshold are set to zero.…”
Section: Variational Selection Schemesmentioning
confidence: 99%