“…To accelerate inference and save storage space for huge models without sacrificing performance, previous works propose to compress models with techniques including weight pruning [24], channel slimming [43,44], layer skipping [4,73], patterned or block pruning [17,35,40,42,49,50,51,52,56,57,82,84], and network quantization [12,18,30,31,32,38,75]. Specifically, these studies elaborate on compressing discriminative models for image classification, detection, or segmentation tasks.…”