ADA-Tucker: Compressing deep neural networks via adaptive dimension adjustment tucker decomposition

Zhong, Zhisheng; Wei, Fangyin; Lin, Zhouchen; Zhang, Chao

doi:10.1016/j.neunet.2018.10.016

Cited by 23 publications

(10 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Compression methods based on pre-trained model mainly include indirect compression, which consist of three steps: pre-trained model, model compression and fine-tuning. Tucker-2 decomposition 10 , as a mathematical method to explore low-rank features of large-scale tensor data, has been widely used for indirect compression.…”

Section: Introductionmentioning

confidence: 99%

“…However, other compression methods, such as pruning 11 and knowledge distillation 12 , have demonstrated that proper utilization of information from pre-trained models is crucial for model compression 13 . For Tucker decomposition-based indirect compression 10 , although the information of the pre-trained model is utilized, the pre-trained model still lacks low-rank property. The approximation error after direct low-rank tensor decomposition is too large to properly recover even using fine-tuning.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Compressing CNN by alternating constraint optimization framework

Liu

Liu²,

Shi³

et al. 2022

Fourteenth International Conference on Digital Image Processing (ICDIP 2022)

View full text Add to dashboard Cite

Tensor decomposition has been extensively studied for convolutional neural networks (CNN) model compression. However, the direct decomposition of an uncompressed model into low-rank form causes unavoidable approximation error due to the lack of low-rank property of a pre-trained model. In this manuscript, a CNN model compression method using alternating constraint optimization framework (ACOF) is proposed. Firstly, ACOF formulates tensor decomposition-based model compression as a constraint optimization problem with low tensor rank constraints. This optimization problem is then solved systematically in an iterative manner using alternating direction method of multipliers (ADMM). During the alternating process, the uncompressed model gradually exhibits low-rank tensor property, and then the approximation error in low-rank tensor decomposition can be negligible. Finally, a high-performance CNN compression network can be effectively obtained by SGD-based fine-tuning. Extensive experimental results on image classification show that ACOF produces the optimal compressed model with high performance and low computational complexity. Notably, ACOF compresses Resnet56 to 28% without accuracy drop, and the compressed model have 1.14% higher accuracy than learning-compression (LC) method.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Compressing CNN by alternating constraint optimization framework

Liu

Liu²,

Shi³

et al. 2022

Fourteenth International Conference on Digital Image Processing (ICDIP 2022)

View full text Add to dashboard Cite

show abstract

“…Among various compression methods like pruning [16,29,19] and weight decomposition [36], quantization method [9,37,7,27,14] compresses a neural network by using lower bit-width for weight values without changing the model architecture, which is particularly useful for carefullydesigned network architectures like transformers. Quantizing both weights and inputs can speed up inference by tuning floating-point operations into integer or bit operations.…”

Section: Introductionmentioning

confidence: 99%

Post-Training Quantization for Vision Transformer

Liu¹,

Wang²,

Han³

et al. 2021

Preprint

View full text Add to dashboard Cite

Recently, transformer has achieved remarkable performance on a variety of computer vision applications. Compared with mainstream convolutional neural networks, vision transformers are often of sophisticated architectures for extracting powerful feature representations, which are more difficult to be developed on mobile devices. In this paper, we present an effective post-training quantization algorithm for reducing the memory storage and computational costs of vision transformers. Basically, the quantization task can be regarded as finding the optimal low-bit quantization intervals for weights and inputs, respectively. To preserve the functionality of the attention mechanism, we introduce a ranking loss into the conventional quantization objective that aims to keep the relative order of the self-attention results after quantization. Moreover, we thoroughly analyze the relationship between quantization loss of different layers and the feature diversity, and explore a mixed-precision quantization scheme by exploiting the nuclear norm of each attention map and output feature. The effectiveness of the proposed method is verified on several benchmark models and datasets, which outperforms the stateof-the-art post-training quantization algorithms. For instance, we can obtain an 81.29% top-1 accuracy using DeiT-B model on ImageNet dataset with about 8-bit quantization.

show abstract

“…Further research in the area, in particular involving the combination of the TKD with NNs, includes the work in [16] and [17]. In [16] the TKD was used as a convolution kernel for mobile applications, while the authors in [17] adopt a more theoretical approach and propose a method to adaptively adjust the dimensions of the weight tensor in a layer-wise fashion. Both methods have been shown to achieve significant NN parameter compression and consequent reduction in training time.…”

Section: Introductionmentioning

confidence: 99%

“…The authors of the seminal paper [12] are the only to have analytically re-derived back-propagation [18] in terms of the tensor factors stemming from the TT decomposition, however, the dimensionality and order of the employed TTs were arbitrarily chosen, so that, despite the achieved compression, the results were not physically inter-pretable. Most other research in the field tends to rely on automatic differentiation for back-propagation [13], [14], [16], [17]. Although this is convenient for implementation purposes, as automatic differentiation is quite efficient, there is still a necessity for a deep analytical understanding of how the errors are propagated; this is fundamental to the explainability of DNNs.…”

Section: Introductionmentioning

confidence: 99%

Compression and Interpretability of Deep Neural Networks via Tucker Tensor Layer: From First Principles to Tensor Valued Back-Propagation

Calvi,

Moniri,

Mahfouz

et al. 2019

Preprint

View full text Add to dashboard Cite

This work aims to help resolve the two main stumbling blocks in the application of Deep Neural Networks (DNNs), that is, the exceedingly large number of trainable parameters and their physical interpretability. This is achieved through a tensor valued approach, based on the proposed Tucker Tensor Layer (TTL), as an alternative to the dense weight-matrices of DNNs. This allows us to treat the weight-matrices of general DNNs as a matrix unfolding of a higher order weight-tensor. By virtue of the compression properties of tensor decompositions, this enables us to introduce a novel and efficient framework for exploiting the multi-way nature of the weight-tensor in order to dramatically reduce the number of DNN parameters. We also derive the tensor valued back-propagation algorithm within the TTL framework, by extending the notion of matrix derivatives to tensors. In this way, the physical interpretability of the Tucker decomposition is exploited to gain physical insights into the NN training, through the process of computing gradients with respect to each factor matrix. The proposed framework is validated on both synthetic data, and the benchmark datasets MNIST, Fashion-MNIST, and CIFAR-10. Overall, through the ability to provide the relative importance of each data feature in training, the TTL back-propagation is shown to help mitigate the "blackbox" nature inherent to NNs. Experiments also illustrate that the TTL achieves a 66.63-fold compression on MNIST and Fashion-MNIST, while, by simplifying the VGG-16 network, it achieves a 10% speed up in training time, at a comparable performance.

show abstract

ADA-Tucker: Compressing deep neural networks via adaptive dimension adjustment tucker decomposition

Cited by 23 publications

References 18 publications

Compressing CNN by alternating constraint optimization framework

Compressing CNN by alternating constraint optimization framework

Post-Training Quantization for Vision Transformer

Compression and Interpretability of Deep Neural Networks via Tucker Tensor Layer: From First Principles to Tensor Valued Back-Propagation

Contact Info

Product

Resources

About