“…There are two major methodologies in neural net compression i) structured pruning and knowledge distillation. The existing work on structured pruning [4,5,6,7,8,9,17,18,21,26,49] addresses only the width of the layer by removing filters based on their importance. However, selecting what to prune from large CNNs is an NP-Hard problem, to find the optimal solution one would need to rank each filter by turning it off and perform inference using all the samples.…”