Shervin Vakili scite author profile

Convolutional Neural Networks (CNNs) have proven to be extremely accurate for image recognition, even outperforming human recognition capability. When deployed on battery-powered mobile devices, efficient computer architectures are required to enable fast and energy-efficient computation of costly convolution operations. Despite recent advances in hardware accelerator design for CNNs, two major problems have not yet been addressed effectively, particularly when the convolution layers have highly diverse structures: (1) minimizing energy-hungry off-chip DRAM data movements; (2) maximizing the utilization factor of processing resources to perform convolutions. This work thus proposes an energy-efficient architecture equipped with several optimized dataflows to support the structural diversity of modern CNNs. The proposed approach is evaluated by implementing convolutional layers of VGG-Net-16 and ResNet-50. Results show that the architecture achieves a Processing Element (PE) utilization factor of 98% for the majority of 3×3 and 1×1 convolutional layers, while limiting latency to 396.9 ms and 92.7 ms when performing convolutional layers of VGGNet-16 and ResNet-50, respectively. In addition, the proposed architecture benefits from the structured sparsity in ResNet-50 to reduce the latency to 42.5 ms when half of the channels are pruned.

show abstract

Parallel scalable hardware implementation of asynchronous discrete particle swarm optimization

Farmahini-Farahani

Vakili

Fakhraie

et al. 2010

Engineering Applications of Artificial Intelligence

View full text Add to dashboard Cite

Power Reduction in CNN Pooling Layers with a Preliminary Partial Computation Strategy

Ahmadi

Vakili

Langlois

et al. 2018

View full text Add to dashboard Cite

Evolvable multi-processor: a novel MPSoC architecture with evolvable task decomposition and scheduling

Vakili¹,

Fakhraie²,

Mohammadi³

2010

View full text Add to dashboard Cite

An Energy-Efficient Accelerator Architecture with Serial Accumulation Dataflow for Deep CNNs

Ahmadi

Vakili

Langlois

2020

View full text Add to dashboard Cite

Convolutional Neural Networks (CNNs) have shown outstanding accuracy for many vision tasks during recent years. When deploying CNNs on portable devices and embedded systems, however, the large number of parameters and computations result in long processing time and low battery life. An important factor in designing CNN hardware accelerators is to efficiently map the convolution computation onto hardware resources. In addition, to save battery life and reduce energy consumption, it is essential to reduce the number of DRAM accesses since DRAM consumes orders of magnitude more energy compared to other operations in hardware. In this paper, we propose an energy-efficient architecture which maximally utilizes its computational units for convolution operations while requiring a low number of DRAM accesses. The implementation results show that the proposed architecture performs one image recognition task using the VGGNet model with a latency of 393 ms and only 251.5 MB of DRAM accesses.

show abstract

Customized embedded processor design for global photographic tone mapping

Vakili

Gil

Langlois

et al. 2011

View full text Add to dashboard Cite

A Low-Cost Fault-Tolerant Approach for Hardware Implementation of Artificial Neural Networks

Ahmadi

Sargolzaie

Fakhraie

et al. 2009

View full text Add to dashboard Cite

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Shervin Vakili

Enhanced Precision Analysis for Accuracy-Aware Bit-Width Optimization Using Affine Arithmetic

CARLA: A Convolution Accelerator With a Reconfigurable and Low-Energy Architecture

Parallel scalable hardware implementation of asynchronous discrete particle swarm optimization

Power Reduction in CNN Pooling Layers with a Preliminary Partial Computation Strategy

Evolvable multi-processor: a novel MPSoC architecture with evolvable task decomposition and scheduling

An Energy-Efficient Accelerator Architecture with Serial Accumulation Dataflow for Deep CNNs

Customized embedded processor design for global photographic tone mapping

A Low-Cost Fault-Tolerant Approach for Hardware Implementation of Artificial Neural Networks

Contact Info

Product

Resources

About