M-FAC: Efficient Matrix-Free Approximations of Second-Order Information

Frantar, Elias; Kurtic, Eldar; Alistarh, Dan

doi:10.48550/arxiv.2107.03356

Cited by 3 publications

(5 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Various methods exist to select the candidate weights for removal, including magnitude pruning [20], which selects weights with lower absolute values, and gradient-based methods that use the gradient applied to each weight to identify those that are trending towards to zero faster. Within the gradient-based methods, we can find first-order techniques based on the first-derivative information [31,38], and second-order ones [9,21,23], which pursue to find the set of weights whose removal will generate a minimum loss increase in the network. Secondorder methods have proven to be effective in pruning convolutional networks in the past, but they have recently been optimized for Large Language Models (LLMs) [21].…”

Section: Network Pruningmentioning

confidence: 99%

VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores

Castro,

Ivanov,

Andrade

et al. 2023

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

View full text Add to dashboard Cite

The increasing success and scaling of Deep Learning models demands higher computational efficiency and power. Sparsification can lead to both smaller models as well as higher compute efficiency, and accelerated hardware is becoming available. However, exploiting it efficiently requires kernel implementations, pruning algorithms, and storage formats, to utilize hardware support of specialized sparse vector units. An example of those are the NVIDIA's Sparse Tensor Cores (SPTCs), which promise a 2× speedup. However, SPTCs only support the 2:4 format, limiting achievable sparsity ratios to 50%. We present the V:N:M format, which enables the execution of arbitrary N:M ratios on SPTCs. To efficiently exploit the resulting format, we propose Spatha, a high-performance sparselibrary for DL routines. We show that Spatha achieves up to 37× speedup over cuBLAS. We also demonstrate a second-order pruning technique that enables sparsification to high sparsity ratios with V:N:M and little to no loss in accuracy in modern transformers.

show abstract

Section: Network Pruningmentioning

confidence: 99%

VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores

Castro,

Ivanov,

Andrade

et al. 2023

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

View full text Add to dashboard Cite

show abstract

“…Second-order pruning methods, e.g. [14,22,35,54,56] augment this basic metric with second-order information, which can lead to higher accuracy of the resulting pruned models, relative to GMP.…”

Section: Sparsification Techniquesmentioning

confidence: 99%

“…We measure the results of sparse transfer with full and linear finetuning on the same downstream tasks starting from dense ImageNet models pruned using regularization-based and post-training pruning methods. Specifically, we use AC/DC, STR and M-FAC [14], respectively.…”

Section: F Experiments On Mobilenetv1mentioning

confidence: 99%

“…In particular, one such instance is pruning, where M-FAC aims to solve the same optimization problem as WoodFisher, and thus from this point of view these methods are very similar. In particular, it has been shown [14] that M-FAC outperforms WoodFisher on ImageNet models, in terms of accuracy at a given sparsity level. Specifically, for MobileNet, M-FAC surpasses all existing methods at 90% sparsity, reaching 67.2% validation accuracy.…”

Section: F Experiments On Mobilenetv1mentioning

confidence: 99%

See 1 more Smart Citation

How Well Do Sparse Imagenet Models Transfer?

Iofinova¹,

Peşte²,

Kurtz³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Transfer learning is a classic paradigm by which models pretrained on large "upstream" datasets are adapted to yield good results on "downstream," specialized datasets. Generally, it is understood that more accurate models on the "upstream" dataset will provide better transfer accuracy "downstream". In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (CNNs) trained on the ImageNet dataset, which have been pruned-that is, compressed by sparsifiying their connections. Specifically, we consider transfer using unstructured pruned models obtained by applying several state-of-the-art pruning methods, including magnitude-based, second-order, re-growth and regularization approaches, in the context of twelve standard transfer tasks. In a nutshell, our study shows that sparse models can match or even outperform the transfer performance of dense models, even at high sparsities, and, while doing so, can lead to significant inference and even training speedups. At the same time, we observe and analyze significant differences in the behaviour of different pruning methods.

show abstract

“…It is a popular technique to reduce the growing energy and performance costs of neural networks and make it feasible to deploy them in resource-constrained environments such as smart devices. Various approaches have been developed to perform pruning as this has gained considerable attention over the past few years (Zhu & Gupta, 2017;Sui et al, 2021;Liebenwein et al, 2021;Peste et al, 2021;Frantar et al, 2021;Deng et al, 2020).…”

Section: Introductionmentioning

confidence: 99%

End-to-End Sensitivity-Based Filter Pruning

Babaiee¹,

Liebenwein²,

Hasani³

et al. 2022

Preprint

View full text Add to dashboard Cite

In this paper, we present a novel sensitivity-based filter pruning algorithm (SbF-Pruner) to learn the importance scores of filters of each layer end-toend. Our method learns the scores from the filter weights, enabling it to account for the correlations between the filters of each layer. Moreover, by training the pruning scores of all layers simultaneously our method can account for layer interdependencies, which is essential to find a performant sparse sub-network. Our proposed method can train and generate a pruned network from scratch in a straightforward, one-stage training process without requiring a pre-trained network. Ultimately, we do not need layer-specific hyperparameters and pre-defined layer budgets, since SbF-Pruner can implicitly determine the appropriate number of channels in each layer. Our experimental results on different network architectures suggest that SbF-Pruner outperforms advanced pruning methods. Notably, on CIFAR-10, without requiring a pretrained baseline network, we obtain 1.02% and 1.19% accuracy gain on ResNet56 and ResNet110, compared to the baseline reported for state-of-the-art pruning algorithms. This is while SbF-Pruner reduces parameter-count by 52.3% (for ResNet56) and 54% (for ResNet101), which is better than the state-of-the-art pruning algorithms with a high margin of 9.5% and 6.6%.

show abstract

M-FAC: Efficient Matrix-Free Approximations of Second-Order Information

Cited by 3 publications

References 10 publications

VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores

VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores

How Well Do Sparse Imagenet Models Transfer?

End-to-End Sensitivity-Based Filter Pruning

Contact Info

Product

Resources

About