“…Over the recent years, demands to improve the performance of deep neural network (DNNs) have never been satisfied. Prior work approaches faster and more efficient DNNs from different aspects, such as model pruning [28,29,31], kernel factorization [3,14,40], and data quantization [45,50]. Among those efforts, quantizationbased DNN acceleration [45,46,50] finds its strengths in minimum modification of the original model architecture, lower memory consumption, and better runtime performance.…”