Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey

Deng, Lei; Li, Guoqi; Han, Song; Shi, Luping; Xie, Yuan

doi:10.1109/jproc.2020.2976475

Cited by 608 publications

(315 citation statements)

References 200 publications

Supporting

Mentioning

280

Contrasting

Unclassified

Order By: Relevance

“…As DNN networks become deeper and more complex, the required computing power and energy consumption are also increasing [7]- [9]. Since most endpoint devices are batterypowered, energy-efficient ASICs which are able to process DNN is highly required.…”

Section: Imentioning

confidence: 99%

“…Index matching is to match the coordinate of the weight and the coordinate of the input feature map (ifmap) pixel for each operation to determine whether there is a meaningful operation that produces a non-zero product term. However, the irregular distribution of non-zero data leads to a large matching overhead in parallel processing since accelerators have to match multiple coordinates in parallel [9].…”

Section: Imentioning

confidence: 99%

See 1 more Smart Citation

MOSDA: On-Chip Memory Optimized Sparse Deep Neural Network Accelerator With Efficient Index Matching

Shiomi

Onodera

2021

IEEE Open J. Circuits Syst.

View full text Add to dashboard Cite

Section: Imentioning

confidence: 99%

Section: Imentioning

confidence: 99%

MOSDA: On-Chip Memory Optimized Sparse Deep Neural Network Accelerator With Efficient Index Matching

Shiomi

Onodera

2021

IEEE Open J. Circuits Syst.

View full text Add to dashboard Cite

“…[ 106 ] The error from each layer might accumulate and cause accuracy loss or nonconvergence. [ 144 ] In addition, end‐to‐end network adaptation for array nonideal factors is an interesting line of thinking. Hybrid training takes more account of the energy and complexity of a mapping device.…”

Section: Challenges and Outlookmentioning

confidence: 99%

Recent Progress on Memristive Convolutional Neural Networks for Edge Intelligence

Qin

Bao

Wang

et al. 2020

Advanced Intelligent Systems

View full text Add to dashboard Cite

Recently, deep learning has shown substantial breakthroughs in various fields such as speech recognition, image and video classification, and natural language processing. [1-3] The explosive development of deep learning has promoted the convergence of this field with other disciplines. The progress has benefited from the update and improvement of models and theories in computer science, as well as the advancement of contemporary semiconductor chip technology. However, the limited bandwidth and computing resources of the traditional computer system greatly restrict the running speed when faced with the increasing scale of deep neural networks (DNNs). The traditional von Neumann architecture separates data storage and computing. Frequent and inefficient movement of data between the processor and memory or off-chip storage brings latency and energy consumption issues, while the mismatch between data transmission and data processing becomes a bottleneck in the implementation of deep learning in hardware. Due to the high-bandwidth and highparallelism requirements of deep learning, data-intensive artificial intelligence (AI) applications have been dominated by cloud computing; that is, edge devices act as data-collecting interfaces and pass data to clustered cloud computer centers for computing to achieve deep learning. [4] Such AI applications place high requirements on network bandwidth and latency, and take the privacy leakage issue to users. [5] For example, in areas with poor network conditions, Tesla's AI autonomous driving will become unreliable and even life-threatening. With the popularization of deep learning, the efficient AI applications that can be seen daily are becoming an urgent need. Edge intelligence is a concept relative to cloud intelligence. [6] Edge computing requires real-time intelligence on devices with strict budgets for energy consumption and device area, such as smart watches and drones. It pushes cloud services from the network core to the edge of the network that is closer to Internet-of-things (IoT) devices and data sources, and then builds up an end-to-end network. Physical proximity to the informationgeneration sources is the most crucial characteristic emphasized by edge computing, wherefore high energy efficiency, small size, low latency, and high privacy protection become valued characteristics for edge intelligence. [7-9] With the combination of hardware and AI, devices dedicated for deep learning have emerged. These devices are called neural network accelerators. The combination of a traditional complementary metal-oxide semiconductor (CMOS) and emerging nonvolatile memory provides a considerable wealth of possibilities for AI accelerators. [10-13] The use of memory technology as a synaptic weight matrix storage unit has set a foundation for the hardware implementation of neuromorphic computing systems. In some prominent AI chips, traditional memories have been utilized; for example,

show abstract

“…There have also been significant strides in the development of hardware accelerators for SNNs [116], [117], [118], CNNs [119], [120], [121], GNNs [122], [123] and training accelerators [124], [125], [126]. A comprehensive survey of the topic can be found in [127], [121]. We also refer the reader to recent research on attention networks [128] used in image captioning applications, transformers [129] used in natural language processing and on neural architecture search [130] to design neural network configurations that reduce the complexity of the network.…”

Section: Conclusion and Summarymentioning

confidence: 99%

Brain-Inspired Computing: Models and Architectures

Parhi

Unnikrishnan

2020

IEEE Open J. Circuits Syst.

View full text Add to dashboard Cite

With an exponential increase in the amount of data collected per day, the fields of artificial intelligence and machine learning continue to progress at a rapid pace with respect to algorithms, models, applications, and hardware. In particular, deep neural networks have revolutionized these fields by providing unprecedented human-like performance in solving many real-world problems such as image or speech recognition. There is also significant research aimed at unraveling the principles of computation in large biological neural networks and, in particular, biologically plausible spiking neural networks. This paper presents an overview of the brain-inspired computing models starting with the development of the perceptron and multi-layer perceptron followed by convolutional neural networks (CNNs) and recurrent neural networks (RNNs). The paper also briefly reviews other neural network models such as Hopfield neural networks and Boltzmann machines. Other models such as spiking neural networks (SNNs) and hyperdimensional computing are then briefly reviewed. Recent advances in these neural networks and graph related neural networks are then described.

show abstract

Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey

Cited by 608 publications

References 200 publications

MOSDA: On-Chip Memory Optimized Sparse Deep Neural Network Accelerator With Efficient Index Matching

MOSDA: On-Chip Memory Optimized Sparse Deep Neural Network Accelerator With Efficient Index Matching

Recent Progress on Memristive Convolutional Neural Networks for Edge Intelligence

Brain-Inspired Computing: Models and Architectures

Contact Info

Product

Resources

About