Vinayak Gokhale scite author profile

Deep convolutional neural networks (CNNs) are the deep learning model of choice for performing object detection, classification, semantic segmentation and natural language processing tasks. CNNs require billions of operations to process a frame. This computational complexity, combined with the inherent parallelism of the convolution operation make CNNs an excellent target for custom accelerators. However, when optimizing for different CNN hierarchies and data access patterns, it is difficult for custom accelerators to achieve close to 100% computational efficiency. In this work, we present Snowflake, a scalable and efficient accelerator that is agnostic to CNN workloads, and was designed to always perform at near-peak hardware utilization. Snowflake is able to achieve a computational efficiency of over 91% on modern CNN models. Snowflake, implemented on a Xilinx Zynq XC7Z045 SoC is capable of achieving a peak throughput of 128 G-ops/s and a measured throughput of 100 frames per second and 120 G-ops/s on the AlexNet CNN model, 36 frames per second and 116 Gops/s on the GoogLeNet CNN model and 17 frames per second and 122 G-ops/s on the ResNet-50 CNN model. To the best of our knowledge, Snowflake is the only implemented system capable of achieving over 91% efficiency on modern CNNs and the only implemented system with GoogLeNet and ResNet as part of the benchmark suite.

show abstract

An efficient implementation of deep convolutional neural networks on a mobile coprocessor

Jin¹,

Gokhale²,

Dundar³

et al. 2014

View full text Add to dashboard Cite

Memory access optimized routing scheme for deep networks on a mobile coprocessor

Dundar¹,

Jin

Gokhale

et al. 2014

View full text Add to dashboard Cite

In this paper, we present a memory access op timized routing scheme for a hardware accelerated real-time implementation of deep convolutional neural networks (DCNNs) on a mobile platform. DCNNs consist of multiple la y ers of 3D convolutions, each comprising between tens and hundreds of filters and the y generate the most expensive operations in DCNNs. S y stems that run DCNNs need to pass 3D input maps to the hardware accelerators for convolutions and the y face the limitation of streaming data in and out of the hardware accelerator. The bandwidth limited s y stems require data reuse to utilize computational resources efficientl y . We propose a new routing scheme for 3D convolutions b y taking advantage of the characteristic of DCNNs to full y utilize all the resources in the hardware accelerator. This routing scheme is implemented on the Xilinx Z y nq-7000 All Programmable Soc. The s y stem full y explores weight level and node level parallelization of DCNNs and achieves a peak performance 2x better than the previous routing scheme while running DCNNs.

show abstract

Snowflake: A Model Agnostic Accelerator for Deep Convolutional Neural Networks

Gokhale¹,

Zaidy²,

Chang³

et al. 2017

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Vinayak Gokhale

A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks

Snowflake: An efficient hardware accelerator for convolutional neural networks

An efficient implementation of deep convolutional neural networks on a mobile coprocessor

Memory access optimized routing scheme for deep networks on a mobile coprocessor

Snowflake: A Model Agnostic Accelerator for Deep Convolutional Neural Networks

Contact Info

Product

Resources

About