Summary
Low‐Bit Neural Network (LBNN) is a promising technique to enrich intelligent applications running on sustainable Cyber‐Physical Systems (CPS). Although LBNN has the advantages of low memory usage, fast inference and low power consumption, Low‐bit design requires additional computation units and may cause large accuracy drop. In this paper, we approach to design Field Programmable Gate Array (FPGA)‐based LBNN accelerator to support sustainable CPS. First, we propose a method to quantize the neural networks into 2‐bit weights, 8‐bit activations and 8‐bit biases with few accuracy loss. The mapping function is presented to approximate discrete space of weights gradually and quantize the activations and biases through the improved straight‐through estimator. Second, we design the bitwise FPGA‐based accelerator to speed up the LBNN. Different from traditional accelerating techniques (mainly focused on convolution layer), the dataflows of fully connected layer, pooling layer and convolution layer are considered to accelerate all layers of neural networks. The 2×8 bitwise multiplier implemented by AND/XOR operation is devised to replace 32×32‐bit multiplication unit, which can bring faster inference and lower power consumption. We conduct extensive experiments on benchmarks of MNIST, CIFAR‐10, CIFAR‐100 and ImageNet to evaluate the efficiency of our approach. The LBNN obtained by our quantization method can save 93.75% memory with 2.26% accuracy loss on average compared with original networks. The FPGA‐based accelerator achieves a peak performance of 427.71 GOPS under 100 MHz working frequency, which outperforms previous approaches significantly.