“…a) Efficient Neural Network: Several different approaches to reduce the memory footprint, latency, and power of modern neural network (NN) architectures. These techniques can be broadly categorized into (1) model pruning [18,31,35,38,40,67], (2) knowledge distillation [21,39,43,49,70], (3) efficient neural architecture design [23,24,37,51,57], (4) hardware and neural architecture co-design [16,17,22,29,64], and (5) quantization [5,7,8,14,15,27,34,48,60,66,72,73].…”