Low-Rank+Sparse Tensor Compression for Neural Networks

Hawkins, Cole; Yang, Haichuan; Li, Meng; Lai, Liangzhen; Chandra, Vikas

doi:10.48550/arxiv.2111.01697

Cited by 2 publications

(4 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The proposed HMC results in a precision loss of only 0.11% at 2.07x. When the compression ratio reaches 5.56x (higher than 5.36x and 5.50x in literature PLOS ONE [38,39]), HMC only results in 0.94% accuracy loss, which is much lower than ATMC's 2.01% and 1.44% in literature [39]. This shows that our method achieves a higher compression ratio and better compression Results on ImageNet.…”

Section: Plos Onementioning

confidence: 61%

“…In the existing work, some scholars have studied the compression of the VGG network [37] and ResNet network [38] by combining low-rank decomposition and sparse representation, while others have studied the low-rank + sparse weight compression of SOTA architecture that relies on efficient depth-separable convolution [39]. These methods apply additive lowrank plus sparse compression to the weights of the neural network, as shown in Fig 1 , and can obtain better compression results than sparse compression or low-rank decomposition alone.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

HMC: Hybrid model compression method based on layer sensitivity grouping

Yang,

Yu,

Yang

et al. 2023

PLoS ONE

View full text Add to dashboard Cite

Previous studies have shown that deep models are often over-parameterized, and this parameter redundancy makes deep compression possible. The redundancy of model weight is often manifested as low rank and sparsity. Ignoring any part of the two or the different distributions of these two characteristics in the model will lead to low accuracy and a low compression rate of deep compression. To make full use of the difference between low-rank and sparsity, a unified framework combining low-rank tensor decomposition and structured pruning is proposed: a hybrid model compression method based on sensitivity grouping (HMC). This framework unifies the existing additive hybrid compression method (AHC) and the non-additive hybrid compression method (NaHC) proposed by us into one model. The latter group the network according to the sensitivity difference of the convolutional layer to different compression methods, which can better integrate the low rank and sparsity of the model compared with the former. Experiments show that our approach achieves a better trade-off between test accuracy and compression ratio when compressing the ResNet family of models than other recent compression methods using a single strategy or additive hybrid compression.

show abstract

Section: Plos Onementioning

confidence: 61%

Section: Introductionmentioning

confidence: 99%

HMC: Hybrid model compression method based on layer sensitivity grouping

Yang,

Yu,

Yang

et al. 2023

PLoS ONE

View full text Add to dashboard Cite

show abstract

“…Therefore, our method is crucial to remedy this drawback. Yu et al (2017); Hawkins et al (2021) have applied the low-rank and sparse compression to CNN. They mask out some kernels in a convolution layer as the sparse approximation and add two sequential convolutional layers that are parallel to the sparse convolutional layer as the low-rank approximation.…”

Section: Discussionmentioning

confidence: 99%

“…In that case, low-rank approximations are designed to store shared features across all coherent parts of neurons, and sparse approximations aim to learn distinct features from incoherent parts of neurons. Besides, previous work (Yu et al, 2017;Hawkins et al, 2021;Chen et al, 2021) applied a similar method to Convolutional Neural Networks (CNN) and parameter-efficient fine-tuning, but we will discuss the limitation of their methods in Section 5.…”

Section: Introductionmentioning

confidence: 99%

Structural local sparse and low-rank tracker using deep features

Zhang

Chen

2023

Multimedia Systems

View full text Add to dashboard Cite

Transformer models have achieved remarkable results in various natural language tasks, but they are often prohibitively large, requiring massive memories and computational resources.To reduce the size and complexity of these models, we propose LoSparse (Low-Rank and Sparse approximation), a novel model compression technique that approximates a weight matrix by the sum of a low-rank matrix and a sparse matrix. Our method combines the advantages of both low-rank approximations and pruning, while avoiding their limitations. Low-rank approximation compresses the coherent and expressive parts in neurons, while pruning removes the incoherent and non-expressive parts in neurons. Pruning enhances the diversity of low-rank approximations, and low-rank approximation prevents pruning from losing too many expressive neurons. We evaluate our method on natural language understanding, question answering, and natural language generation tasks. We show that it significantly outperforms existing compression methods. Our code is publicly available at https://github.com/yxli2123/LoSparse * Published as a conference paper in ICML 2023. † Li, Yu, Zhang, Liang and Zhao are affiliated with Georgia Tech. He and Chen are affiliated with Microsoft Azure.

show abstract

Low-Rank+Sparse Tensor Compression for Neural Networks

Cited by 2 publications

References 15 publications

HMC: Hybrid model compression method based on layer sensitivity grouping

HMC: Hybrid model compression method based on layer sensitivity grouping

Structural local sparse and low-rank tracker using deep features

Contact Info

Product

Resources

About