2019
DOI: 10.1016/j.neunet.2018.10.016
|View full text |Cite
|
Sign up to set email alerts
|

ADA-Tucker: Compressing deep neural networks via adaptive dimension adjustment tucker decomposition

Abstract: Despite recent success of deep learning models in numerous applications, their widespread use on mobile devices is seriously impeded by storage and computational requirements. In this paper, we propose a novel network compression method called Adaptive Dimension Adjustment Tucker decomposition (ADA-Tucker). With learnable core tensors and transformation matrices, ADA-Tucker performs Tucker decomposition of arbitrary-order tensors. Furthermore, we propose that weight tensors in networks with proper order and ba… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
10
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 23 publications
(10 citation statements)
references
References 18 publications
0
10
0
Order By: Relevance
“…Compression methods based on pre-trained model mainly include indirect compression, which consist of three steps: pre-trained model, model compression and fine-tuning. Tucker-2 decomposition 10 , as a mathematical method to explore low-rank features of large-scale tensor data, has been widely used for indirect compression.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Compression methods based on pre-trained model mainly include indirect compression, which consist of three steps: pre-trained model, model compression and fine-tuning. Tucker-2 decomposition 10 , as a mathematical method to explore low-rank features of large-scale tensor data, has been widely used for indirect compression.…”
Section: Introductionmentioning
confidence: 99%
“…However, other compression methods, such as pruning 11 and knowledge distillation 12 , have demonstrated that proper utilization of information from pre-trained models is crucial for model compression 13 . For Tucker decomposition-based indirect compression 10 , although the information of the pre-trained model is utilized, the pre-trained model still lacks low-rank property. The approximation error after direct low-rank tensor decomposition is too large to properly recover even using fine-tuning.…”
Section: Introductionmentioning
confidence: 99%
“…Among various compression methods like pruning [16,29,19] and weight decomposition [36], quantization method [9,37,7,27,14] compresses a neural network by using lower bit-width for weight values without changing the model architecture, which is particularly useful for carefullydesigned network architectures like transformers. Quantizing both weights and inputs can speed up inference by tuning floating-point operations into integer or bit operations.…”
Section: Introductionmentioning
confidence: 99%
“…Further research in the area, in particular involving the combination of the TKD with NNs, includes the work in [16] and [17]. In [16] the TKD was used as a convolution kernel for mobile applications, while the authors in [17] adopt a more theoretical approach and propose a method to adaptively adjust the dimensions of the weight tensor in a layer-wise fashion. Both methods have been shown to achieve significant NN parameter compression and consequent reduction in training time.…”
Section: Introductionmentioning
confidence: 99%
“…The authors of the seminal paper [12] are the only to have analytically re-derived back-propagation [18] in terms of the tensor factors stemming from the TT decomposition, however, the dimensionality and order of the employed TTs were arbitrarily chosen, so that, despite the achieved compression, the results were not physically inter-pretable. Most other research in the field tends to rely on automatic differentiation for back-propagation [13], [14], [16], [17]. Although this is convenient for implementation purposes, as automatic differentiation is quite efficient, there is still a necessity for a deep analytical understanding of how the errors are propagated; this is fundamental to the explainability of DNNs.…”
Section: Introductionmentioning
confidence: 99%