“…General model compression approaches fall under multiple forms [12]: pruning [21,63], quantization [64,56,16], knowledge distillation [27,44], as well as their compositions [61,69,71]. A Binary Neural Network (BNN) [13,14,34,73,51,14,36,48,44,29,72,39,25,37,10,57,20,66] represents the most extreme form of model quantization as it quantizes weights in convolution layers to only 1 bit, enjoying great speed-up compared with its full-precision counterpart. [50] roughly divides previous BNN literature into two categories: (i) native BNN [13,14,34] which directly applies binarization to a full-precision model by a pre-defined binarization function.…”