DARC: Differentiable ARchitecture Compression

Singh, Shashank; Khetan, Ashish; Karnin, Zohar

doi:10.48550/arxiv.1905.08170

Cited by 2 publications

(2 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A closely related line of work is Neural Architecture Search (NAS). It aims to efficiently search the space of architectures (Pham et al, 2018;Liu et al, 2018;Singh et al, 2019). Quantization is another technique to reduce the model size.…”

Section: Related Workmentioning

confidence: 99%

schuBERT: Optimizing Elements of BERT

Khetan¹,

Karnin²

2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

Transformers (Vaswani et al., 2017) have gradually become a key component for many state-of-the-art natural language representation models. A recent Transformer based model- BERT (Devlin et al., 2018) achieved state-of-the-art results on various natural language processing tasks, including GLUE, SQuAD v1.1, and SQuAD v2.0. This model however is computationally prohibitive and has a huge number of parameters. In this work we revisit the architecture choices of BERT in efforts to obtain a lighter model. We focus on reducing the number of parameters yet our methods can be applied towards other objectives such FLOPs or latency. We show that much efficient light BERT models can be obtained by reducing algorithmically chosen correct architecture design dimensions rather than reducing the number of Transformer encoder layers. In particular, our schuBERT gives 6.6% higher average accuracy on GLUE and SQuAD datasets as compared to BERT with three encoder layers while having the same number of parameters.

show abstract

Section: Related Workmentioning

confidence: 99%

schuBERT: Optimizing Elements of BERT

Khetan¹,

Karnin²

2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

show abstract

“…Most of these works fall primarily into one of the four categories: quan- tization, low rank factorization, sparse connections, and structured pruning. Besides compression methods, there have been a lot of work in architecture search for compute efficient networks [22,29]. Recently, model compression techniques has also been applied on natural language processing models [19,17].…”

Section: Related Workmentioning

confidence: 99%

PruneNet: Channel Pruning via Global Importance

Khetan,

Karnin

2020

Preprint

Self Cite

View full text Add to dashboard Cite

Channel pruning is one of the predominant approaches for accelerating deep neural networks. Most existing pruning methods either train from scratch with a sparsity inducing term such as group lasso, or prune redundant channels in a pretrained network and then fine tune the network. Both strategies suffer from some limitations: the use of group lasso is computationally expensive, difficult to converge and often suffers from worse behavior due to the regularization bias. The methods that start with a pretrained network either prune channels uniformly across the layers or prune channels based on the basic statistics of the network parameters. These approaches either ignore the fact that some CNN layers are more redundant than others or fail to adequately identify the level of redundancy in different layers. In this work, we investigate a simple-yet-effective method for pruning channels based on a computationally lightweight yet effective data driven optimization step that discovers the necessary width per layer. Experiments conducted on ILSVRC-12 confirm effectiveness of our approach. With non-uniform pruning across the layers on ResNet-50, we are able to match the FLOP reduction of state-of-the-art channel pruning results while achieving a 0.98% higher accuracy. Further, we show that our pruned ResNet-50 network outperforms ResNet-34 and ResNet-18 networks, and that our pruned ResNet-101 outperforms ResNet-50.

show abstract

DARC: Differentiable ARchitecture Compression

Cited by 2 publications

References 28 publications

schuBERT: Optimizing Elements of BERT

schuBERT: Optimizing Elements of BERT

PruneNet: Channel Pruning via Global Importance

Contact Info

Product

Resources

About