Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018
DOI: 10.18653/v1/d18-1319
|View full text |Cite
|
Sign up to set email alerts
|

Bayesian Compression for Natural Language Processing

Abstract: In natural language processing, a lot of the tasks are successfully solved with recurrent neural networks, but such models have a huge number of parameters. The majority of these parameters are often concentrated in the embedding layer, which size grows proportionally to the vocabulary length. We propose a Bayesian sparsification technique for RNNs which allows compressing the RNN dozens or hundreds of times without time-consuming hyperparameters tuning. We also generalize the model for vocabulary sparsificati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
58
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 11 publications
(58 citation statements)
references
References 9 publications
0
58
0
Order By: Relevance
“…This technique allows achieving better final performance of the model because such a train- 13.4 / 0.14 32.1% / 97.8% 91.84 27.2% LR for Softmax, (Grachev et al, 2019) 14.5 / 1.19 26.8 % / 81.7 % 84.12 N/A TT for Softmax, (Grachev et al, 2019) 14. (Chirkova et al, 2018) 3.2 / 0.12 43.3 % / 95.5 % 109.2 N/A DSVI-ARD (Ours)…”
Section: Training and Evaluationmentioning
confidence: 85%
See 2 more Smart Citations
“…This technique allows achieving better final performance of the model because such a train- 13.4 / 0.14 32.1% / 97.8% 91.84 27.2% LR for Softmax, (Grachev et al, 2019) 14.5 / 1.19 26.8 % / 81.7 % 84.12 N/A TT for Softmax, (Grachev et al, 2019) 14. (Chirkova et al, 2018) 3.2 / 0.12 43.3 % / 95.5 % 109.2 N/A DSVI-ARD (Ours)…”
Section: Training and Evaluationmentioning
confidence: 85%
“…, perplexity and accuracy 2 on the test set. The comparison of DSVI-ARD with other dense layers compression approaches revealed that our models can exhibit comparable perplexity quality while achieving much higher compression (in Grachev et al (2019) case) and even surpass models based on similar Bayesian compression techniques (in Chirkova et al (2018)…”
Section: Training and Evaluationmentioning
confidence: 86%
See 1 more Smart Citation
“…In Table 2, the state-of-the-art perplexities for language modeling problem are assembled. In addition, we present in the last two rows of this table the best known results (for the PTB dataset) of compressed RNNs using SparseVD method [33]. Here the number of parameters for the compressed model from the paper [33] is computed in line with the remaining models as follows.…”
Section: Compression Resultsmentioning
confidence: 99%
“…The main advantage of the Bayesian sparsification techniques is that they have a small number of hyperparameters compared to pruningbased methods. As stated in (Chirkova et al, 2018), Bayesian compression also leads to a higher sparsity level Neklyudov et al, 2017;Louizos et al, 2017). Our proposed VVD is inspired by these predecessors to specifically tackle the vocabulary redundancy problem in NLP tasks.…”
Section: Related Workmentioning
confidence: 96%