“…Network Compression. Generally, compression methods can be categorized into five types: quantization [3,31,7,43,1], knowledge distillation [14,23,41,53,32], low-rank decomposition [38,6,22,56], weight sparsification [10,26,51], and filter pruning [34,27,13,40]. Quantization methods accelerate deep CNNs by replacing high-precision float point operations with low-precision fixed point ones, which usually incurs significantly accuracy drop.…”