2017
DOI: 10.1007/978-3-319-69900-4_44
|View full text |Cite
|
Sign up to set email alerts
|

Neural Networks Compression for Language Modeling

Abstract: Abstract. In this paper, we consider several compression techniques for the language modeling problem based on recurrent neural networks (RNNs). It is known that conventional RNNs, e.g, LSTM-based networks in language modeling, are characterized with either high space complexity or substantial inference time. This problem is especially crucial for mobile applications, in which the constant interaction with the remote server is inappropriate. By using the Penn Treebank (PTB) dataset we compare pruning, quantiza… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 23 publications
(7 citation statements)
references
References 8 publications
0
7
0
Order By: Relevance
“…However, the significantly large number of weights incurs massive computation and storage burden, hindering the deployment of the state-of-the-art deep learning methods on resource-constrained platforms, such as mobile phones and embedded devices. It has been extensively studied and shown that there exists inherent redundancy in these weights, and there have been increasing research efforts on removing this redundancy, which is known as weight pruning [13], [14], [29], [30].…”
Section: B Network Optimizationmentioning
confidence: 99%
See 1 more Smart Citation
“…However, the significantly large number of weights incurs massive computation and storage burden, hindering the deployment of the state-of-the-art deep learning methods on resource-constrained platforms, such as mobile phones and embedded devices. It has been extensively studied and shown that there exists inherent redundancy in these weights, and there have been increasing research efforts on removing this redundancy, which is known as weight pruning [13], [14], [29], [30].…”
Section: B Network Optimizationmentioning
confidence: 99%
“…[28], [31]. Low-rank matrix factorization [14], [29] is another way of pruning by decomposing the original weight matrix into the linear composition of a set of low-rank weight matrices. Even though these methods can achieve good compression ratio by constraining the rank to a small number, they also incur significant (>3%) accuracy loss.…”
Section: B Network Optimizationmentioning
confidence: 99%
“…Some of them were successfully applied to audio processing [17] and image processing [40]. However, they are not yet well-studied in the language modeling task [14].…”
Section: Pruning and Quantizationmentioning
confidence: 99%
“…Similarly we can apply TT-decomposition to each matrix of LSTM layer (11)- (14) or the matrix of the output layer (4). Moreover, according to [41], the matrix-by-vector product and matrix sum can be efficiently implemented directly in the TT format without the need to convert these matrices to the TT.…”
Section: Tensor Train Decompositionmentioning
confidence: 99%
See 1 more Smart Citation