A 28nm 276.55TFLOPS/W Sparse Deep-Neural-Network Training Processor with Implicit Redundancy Speculation and Batch Normalization Reformulation

Wang, Yang; Qin, Yubin; Deng, Dazheng; Wei, Jingchuan; Chen, Tianbao; Lin, Xinhan; Liu, Leibo; Wei, Shaojun; Yin, Shouyi

doi:10.23919/vlsicircuits52068.2021.9492420

Cited by 17 publications

(11 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Output sparsity exploitation during the WG stage has big benefits thanks to both useless computation avoidance and memory access removal. For this reason, recent energy-efficient training processors [26,31,51,78] supported triple sparsity exploitation by combining iterative pruning.…”

Section: Pruning-aware Output Zero Skipping During the Wgmentioning

confidence: 99%

“…[29] [26,31,51,78] Sparsity Exploitation [17][18][19] was proposed to unify the data representation method of both input operand and accumulation. Flexpoint [55] tried to substitute FP with FXP representation using a shared exponent management algorithm together for simplification of MAC design, but it failed to reduce the required bit-precision to less than 16-bit.…”

Section: A New Number Representationmentioning

confidence: 99%

“…Training processors reported from the industry [14][15][16][17][18][19] mainly focused on generalpurpose DNN training. On the contrary, training processors from the academy [20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35] mainly targeted local training which fine-tunes DNN to be more accurate in user-specific datasets. In-situ personalization is also suggested to fit the network to the user-specific tasks.…”

Section: A Applications and Examples Of Training Processor 1) Applica...mentioning

confidence: 99%

“…The majority of processors [14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31][32] adopted the homogeneous core design and they were programmable to be applied in various types of networks and applications. Moreover, they also emphasized a new FP-number-representation-based PE design by utilizing a new number representation method and adopted precisionconfigurable MAC for energy-efficient inference and training [15-19, 21, 22, 24, 26].…”

Section: ) Examples Of Training Processor Designmentioning

confidence: 99%

“…First-generation mobile training processors [20,22,23,25,35] focused on giving on-chip training functionality by supporting transpose-read and reconfigurable data path. The secondgeneration processors [21,24,27,31] were designed by combining sparsity exploitation or low-bit-precision training schemes to improve their throughput and efficiency. It is now evolved into third-generation processors [26,29,30] which further improved the performance by proposing the adaptive methods of bit-precision or pruning ratio control even when the training is in progress.…”

Section: ) Examples Of Training Processor Designmentioning

confidence: 99%

See 4 more Smart Citations

Energy-Efficient DNN Training Processors on Micro-AI Systems

Han

Kang

Kim

et al. 2022

IEEE Open J. Solid-State Circuits Soc.

View full text Add to dashboard Cite

Many edge/mobile devices are now able to utilize deep neural networks (DNNs) thanks to the development of mobile DNN accelerators. Mobile DNN accelerators overcame the problems of limited computing resources and battery capacity by realizing energy-efficient inference. However, its passive behavior makes it difficult for DNN to provide active customization for individual users or its service environment. The importance of onchip training is rising more and more to provide active interaction between DNN processors and ever-changing surroundings or conditions. Despite its advantages, the DNN training has more constraints than the inference such that it was considered impractical to be realized on mobile/edge devices. Recently, there are many trials to realize mobile DNN training, and a number of prior works will be summarized. Firstly, it arranges the new challenges of the DNN accelerator induced by training functionality and discusses new hardware features related to the challenges. Secondly, it explains algorithm-hardware cooptimization methods and explains why it becomes mainstream in mobile DNN training research. Thirdly, it compares the main differences between the conventional inference accelerators and recent training processors. Finally, the conclusion is made by proposing the future directions of the DNN training processor in micro-AI systems.

show abstract

Section: Pruning-aware Output Zero Skipping During the Wgmentioning

confidence: 99%