Efficient Hardware Architectures for Accelerating Deep Neural Networks: Survey

Dhilleswararao, Pudi; Boppu, Srinivas; Manikandan, M. Sabarimalai; Cenkeramaddi, Linga Reddy

doi:10.1109/access.2022.3229767

Cited by 36 publications

(15 citation statements)

References 193 publications

(215 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the policies learned for the decision networks are opaque to users, and, hence, they cannot be used for predictable adaptation to meet the resource conditions. Various hardware-and software-based inference accelerators have been developed to run trained neural networks efficiently on target devices [43], [44]. While hardware-based accelerators try to maximize the throughput of deep learning operations on specialized hardware [45], software-based accelerators mainly focus on optimizing resource management, pipeline design, model restructuring, and quantization [46]- [50].…”

Section: Related Workmentioning

confidence: 99%

QoS-Aware Inference Acceleration Using Adaptive Depth Neural Networks

Kang

2024

IEEE Access

View full text Add to dashboard Cite

While deep neural networks (DNNs) have brought revolutions in many intelligent services and systems, the deployment of high-performing models for real-world applications faces challenges posed by resource constraints and diverse operating environments. While existing methods such as model compression combined with inference accelerators have enhanced the efficiency of deep neural networks, they are not dynamically adaptable to dynamically changing resource conditions since they provide static accuracy-efficiency trade-offs. Further, since they are not aware of performance requirements, such as desired inference latency, they are not able to provide robust and effective performance under unpredictable workloads. This paper introduces a holistic solution to address this challenge, consisting of two key components: adaptive depth neural networks and the Quality of Service (QoS)-aware inference accelerator. The adaptive depth neural networks exhibit the ability to scale computation instantly with minimal impact on accuracy, utilizing a novel architectural pattern and training algorithm. Complementing this, the QoS-aware inference accelerator employs a feedback control loop, adapting network depth dynamically to meet desired inference latency. Experimental results demonstrate that the proposed adaptive depth networks outperform non-adaptive counterparts, achieving up to 38% dynamic acceleration via depth adaption, with a marginal accuracy loss of 1.5%. Furthermore, the QoS-aware inference accelerator successfully controls network depth at runtime, ensuring robust performance in unpredictable environments.

show abstract

Section: Related Workmentioning

confidence: 99%

QoS-Aware Inference Acceleration Using Adaptive Depth Neural Networks

Kang

2024

IEEE Access

View full text Add to dashboard Cite

show abstract

“…FPGA-based DNN accelerators [13], can be broadly categorized into two types: accelerators tailored for specific applications such as speech recognition, object detection, and natural language processing, and accelerators designed for specific algorithms such as CNN and RNN. Additionally, there exist accelerator frameworks equipped with hardware templates.…”

Section: B Fpga-based Acceleratorsmentioning

confidence: 99%

Accelerated Low Power AI System for Indian Sign Language Recognition

Sandhya,

Kancharla

2024

AJCT

View full text Add to dashboard Cite

Deep Convolutional Neural Network (CNN) based methods have become more powerful for wide variety of applications particularly in Natural Language Processing and Computer vision. Nevertheless, the CNN-based methods are computational expensive and more resource-hungry, and hence are becoming difficult to implement on battery operated devices like smart phones, AR/VR glasses, Autonomous Robots etc. Also with the increasing complexity of deep learning models like ResNet-50, there is a growing demand for efficient hardware accelerators to handle the computational workload.In this paper, we present the design and implementation of a neural network accelerator tailored for ResNet-50 on the ZCU102 platform using Field-Programmable Gate Arrays (FPGAs) which offers and customizable solution to address this challenge. We systematically investigate the design choices and optimization strategies for deploying custom built ResNet-50 network trained for Indian Sign language translation of 76 gestures enacted and build in our labs for Doctor patient interface on FPGA-based accelerators. In order to enhance operational speed, we have employed various techniques, including parallelism and pipelining, leveraging Depthwise Separable Convolution. Furthermore, we have implemented hierarchical memory allocation for different offsets using threads. Additionally, we have utilized weight and data quantization to optimize operational speed while minimizing resource consumption, thus achieving low power consumption while maintaining acceptable levels of inference accuracy. We, evaluated our accelerated FPGA model against CPU interms of various performance metrics viz: frames per second (fps), Memory allocations, LUTs, DSPs and Block RAMs used. Our findings underscore the superiority of FPGA-based accelerators, as evidenced by achieving a frame rate of 2.7fps on the Xilinx Ultra Scale ZCU102 platform with int8 quantization, compared to 0.8fps for Single precision. In contrast, the CPU achieved a frame rate of 0.6fps. Notably, we observed a minimal accuracy variation of only 1.37% with int8 quantization, while no accuracy variation was observed for Single precision. Our implementation utilized 16 convolution threads and 4 FC threads operating at 200 MHz for single precision, whereas for int8, we employed 25 convolution threads and 16 FC threads operating at 250 MHz.

show abstract

“…Innovations in automated machine learning (AutoML) [25] and continual learning models, model compression for deployment on resource-limited devices, robustness against data distribution shifts, and effective multi-modal data integration are pivotal for maximizing DL's impact. For that reason, optimal DL model performance necessitates co-designing hardware [26] and software, highlighting the intricate balance between technological advancements and practical applications in economic contexts. All the preceding and succeeding data is context based on following envisioned environment, as illustrated in Table 1.…”

Section: Challenges and Opportunitiesmentioning

confidence: 99%

Deep learning for economic transformation: a parametric review

Tariq,

Ahmed,

Khan

et al. 2024

IJEECS

View full text Add to dashboard Cite

Deep learning (DL) is increasingly recognized for its effectiveness in analyzing and forecasting complex economic systems, particularly in the context of Pakistan's evolving economy. This paper investigates DL's transformative role in managing and interpreting increasing volumes of intricate economic data, leading to more nuanced insights. DL models show a marked improvement in predictive accuracy and depth over traditional methods across various economic domains and policymaking scenarios. Applications include demand forecasting, risk evaluation, market trend analysis, and resource allocation optimization. These processes utilize extensive datasets and advanced algorithms to identify patterns that traditional methods cannot detect. Nonetheless, DL's broader application in economic research faces challenges like limited data availability, complexity of economic interactions, interpretability of model outputs, and significant computational power requirements. The paper outlines strategies to overcome these barriers, such as enhancing model interpretability, employing federated learning for better data privacy, and integrating behavioral and social economic theories. It concludes by stressing the importance of targeted research and ethical considerations in maximizing DL's impact on economic insights and innovation, particularly in Pakistan and globally.

show abstract

Efficient Hardware Architectures for Accelerating Deep Neural Networks: Survey

Cited by 36 publications

References 193 publications

QoS-Aware Inference Acceleration Using Adaptive Depth Neural Networks

QoS-Aware Inference Acceleration Using Adaptive Depth Neural Networks

Accelerated Low Power AI System for Indian Sign Language Recognition

Deep learning for economic transformation: a parametric review

Contact Info

Product

Resources

About