2020
DOI: 10.1109/access.2020.3039858
|View full text |Cite
|
Sign up to set email alerts
|

Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead

Abstract: Currently, Machine Learning (ML) is becoming ubiquitous in everyday life. Deep Learning (DL) is already present in many applications ranging from computer vision for medicine to autonomous driving of modern cars as well as other sectors in security, healthcare, and finance. However, to achieve impressive performance, these algorithms employ very deep networks, requiring a significant computational power, both during the training and inference time. A single inference of a DL model may require billions of multi… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
50
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
2

Relationship

2
7

Authors

Journals

citations
Cited by 126 publications
(69 citation statements)
references
References 246 publications
(277 reference statements)
1
50
0
1
Order By: Relevance
“…However, such models usually require high computational complexity and memory footprint, so that they are not well suited for resource constrained embedded systems. The use of cloud computing is inconvenient when operational critical apparatus must be monitored, due to the low reliability and high latency of remote connections which requires enough bandwidth to guarantee real-time operations; general purpose platforms, using CPUs and GPUs have got silicon sizes, prices and energy costs which are incompatible with the integration into the apparatus to be monitored [5]. Similar limitations affect devoted processors, such as the Xilinx Deep Learning Processor Unit (DPU) core [20], introduced to accelerate CNN inference on FPGAs.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…However, such models usually require high computational complexity and memory footprint, so that they are not well suited for resource constrained embedded systems. The use of cloud computing is inconvenient when operational critical apparatus must be monitored, due to the low reliability and high latency of remote connections which requires enough bandwidth to guarantee real-time operations; general purpose platforms, using CPUs and GPUs have got silicon sizes, prices and energy costs which are incompatible with the integration into the apparatus to be monitored [5]. Similar limitations affect devoted processors, such as the Xilinx Deep Learning Processor Unit (DPU) core [20], introduced to accelerate CNN inference on FPGAs.…”
Section: Related Workmentioning
confidence: 99%
“…Recent DL approaches are based on Convolutional Neural Networks (CNN), Generative Adversarial Network (GAN), Recurrent Neural Networks (RNN), etc, each one having its own strengths and weaknesses [1]. However, deploying such networks in PdM systems is very challenging and costly due to the very high computational complexity, memory requirements of those models, and power consumption, which do not meet the needs for always-on monitoring of the apparatus [4], [5]. In this context, Auto-Encoders (AE) [6] are an obvious choice since they combine the use of relatively shallow networks with the possibility of unsupervised training, particularly interesting when the availability of labelled faults data is difficult.…”
Section: Introductionmentioning
confidence: 99%
“…A commonly overlooked issue with advanced machine learning methods is their energy consumption: model training and development likely make up a substantial portion of the greenhouse gas emissions because of excessively large models used, for example, in natural language processing [30]. An important solution direction is to use special hardware designed for energy-efficient learning [31,32].…”
Section: Related Research On Ai Issuesmentioning
confidence: 99%
“…This increased complexity hinders the deployment of advanced NNs (DNNs and SNNs) on resourceconstrained edge devices [4]. Therefore, optimizations at different system layers (i.e., HW and SW) are necessary to enable the use of advanced NNs at the edge [2]. Besides performance and energy efficiency, reliability and security aspects are also important to ensure Fig.…”
Section: Introductionmentioning
confidence: 99%