Deep neural networks (DNNs) have become the most important and popular machine learning technique in the emerging artificial intelligence era. Because of their inherent large‐scale sizes, DNN models are both computation intensive and storage intensive, thereby posing huge challenges for efficient deployment. To overcome this problem, a promising solution is to build customized hardware accelerators to improve the processing speed and energy efficiency when executing DNNs. However, the architecture design of specialized DNN accelerator is nontrivial, given the massive amount of data movements, the rapid development of the DNN algorithms and models, the high demand of reconfigurability and programmability, and the strict requirement of preserving accuracy.
To date, many different types of design solutions, varying on device, circuit, architecture, and algorithm levels, have been proposed and implemented in recent years. This article focuses on the review of digital CMOS‐based DNN hardware architecture. By analyzing the design requirements and challenges of DNN accelerators within the classical von Neumann framework, we introduce the basic underlying hardware architecture and computation mapping strategy. Based on that, the advanced optimization techniques are also described. The open problems and challenges for the future DNN hardware architecture are also analyzed and elaborated.