“…There is no dependence on the use of STEs, which greatly improves ease of implementation. Also, compared to previous similar regularizer-based approaches [9,12,33], since in QGT the regularizer is applied on the weight values directly rather than the quantized values, there is no need to learn the scale of the quantized weights separately. Using regularizers, QGT can enforce properties such as clustering of weight values into quantized bins, which can accommodate non-linear, hardwarespecific quantizers.…”