Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition

Lebedev, V.; Ganin, Yaroslav; Rakhuba, Maxim; Oseledets, Ivan; Lempitsky, Victor

doi:10.48550/arxiv.1412.6553

Cited by 135 publications

(219 citation statements)

References 16 publications

Supporting

Mentioning

212

Contrasting

Order By: Relevance

“…We take a model and perform tensor decomposition on all convolutional kernels and FC layers except the last layer. This idea was first studied in Lebedev et al (2014) on larger, more compressible architectures. We report our results in Figure 7.…”

Section: Low-rank Onlymentioning

confidence: 99%

See 1 more Smart Citation

Low-Rank+Sparse Tensor Compression for Neural Networks

Hawkins¹,

Yang²,

Li³

et al. 2021

Preprint

View full text Add to dashboard Cite

Low-rank tensor compression has been proposed as a promising approach to reduce the memory and compute requirements of neural networks for their deployment on edge devices. Tensor compression reduces the number of parameters required to represent a neural network weight by assuming network weights possess a coarse higher-order structure. This coarse structure assumption has been applied to compress large neural networks such as VGG and ResNet. However modern state-of-the-art neural networks for computer vision tasks (i.e. MobileNet, Effi-cientNet) already assume a coarse factorized structure through depthwise separable convolutions, making pure tensor decomposition a less attractive approach. We propose to combine low-rank tensor decomposition with sparse pruning in order to take advantage of both coarse and fine structure for compression. We compress weights in SOTA architectures (MobileNetv3, EfficientNet, Vision Transformer) and compare this approach to sparse pruning and tensor decomposition alone.

show abstract

Section: Low-rank Onlymentioning

confidence: 99%

“…Therefore, in this paper we target memory cost reduction. In this area low-rank tensor compression is a popular approach (Garipov et al, 2016;Novikov et al, 2015;Lebedev et al, 2014) that can achieve orders of magnitude compression, but can lead to significant accuracy loss.…”

Section: Introductionmentioning

confidence: 99%

Low-Rank+Sparse Tensor Compression for Neural Networks

Hawkins¹,

Yang²,

Li³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Different tensor decompositions result in different TNNs, e.g. , CP-Nets [24], Tucker-Nets [4], [21], HT-Nets [48], BT-Nets [46], TT-Nets [7], [30] and TR-Nets [44] corresponding to CANDECOMP/PARAFAC (CP) decomposition [13], Tucker decomposition [41], Hierarchical Tucker (HT) decomposition [10], Block-Term (BT) decomposition [5], Tensor Train (TT) [32] and Tensor Ring (TR) [50], respectively. Tjandra et al [39] show that TT-format has a better performance compared with Tucker-format with the same number of parameters required in the recurrent neural network (RNN).…”

Section: Introductionmentioning

confidence: 99%

Semi-tensor Product-based TensorDecomposition for Neural Network Compression

Zhao¹,

Liu²,

Huang³

et al. 2021

Preprint

View full text Add to dashboard Cite

The existing tensor networks adopt conventional matrix product for connection. The classical matrix product requires strict dimensionality consistency between factors, which can result in redundancy in data representation. In this paper, the semi-tensor product is used to generalize classical matrix product based mode product to semi-tensor mode product. As it permits the connection of two factors with different dimensionality, more flexible and compact tensor decompositions can be obtained with smaller sizes of factors. Tucker decomposition, Tensor Train (TT) and Tensor Ring (TR) are common decompositions for low rank compression of deep neural networks. The semi-tensor product is applied to these tensor decompositions to obtained their generalized versions, i.e., semi-tensor Tucker decomposition (STTu), semi-tensor train (STT) and semi-tensor ring (STR). Experimental results show the STTu, STT and STR achieve higher compression factors than the conventional tensor decompositions with the same accuracy but less training times in ResNet and WideResNet compression. With 2% accuracy degradation, the TT-RN (rank = 14) and the TR-WRN (rank = 16) only obtain 3 times and 99t times compression factors while the STT-RN (rank = 14) and the STR-WRN (rank = 16) achieve 9 times and 179 times compression factors, respectively.

show abstract

“…Many methods have been proposed for CNN compression. For example: weight quantization [2,4], tensor lowrank factorization [23,29], network pruning [14,13,61, 19], and knowledge distillation [21,48]. Among them all, a combination of channel pruning and knowledge distillation is the preferable method to learn smaller dense models, which can easily leverage Basic Linear Algebra Subprograms (BLAS) libraries [31].…”

Section: Introductionmentioning

confidence: 99%

Class-Discriminative CNN Compression

Liu¹,

Wentzlaff²,

Kung³

2021

Preprint

View full text Add to dashboard Cite

Compressing convolutional neural networks (CNNs) by pruning and distillation has received ever-increasing focus in the community. In particular, designing a classdiscrimination based approach would be desired as it fits seamlessly with the CNNs training objective. In this paper, we propose class-discriminative compression (CDC), which injects class discrimination in both pruning and distillation to facilitate the CNNs training goal. We first study the effectiveness of a group of discriminant functions for channel pruning, where we include well-known single-variate binary-class statistics like Student's T-Test in our study via an intuitive generalization. We then propose a novel layeradaptive hierarchical pruning approach, where we use a coarse class discrimination scheme for early layers and a fine one for later layers. This method naturally accords with the fact that CNNs process coarse semantics in the early layers and extract fine concepts at the later. Moreover, we leverage discriminant component analysis (DCA) to distill knowledge of intermediate representations in a subspace with rich discriminative information, which enhances hidden layers' linear separability and classification accuracy of the student. Combining pruning and distillation, CDC is evaluated on CIFAR and ILSVRC-2012, where we consistently outperform the state-of-the-art results.

show abstract

Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition

Cited by 135 publications

References 16 publications

Low-Rank+Sparse Tensor Compression for Neural Networks

Low-Rank+Sparse Tensor Compression for Neural Networks

Semi-tensor Product-based TensorDecomposition for Neural Network Compression

Class-Discriminative CNN Compression

Contact Info

Product

Resources

About