A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks

Gokhale, Vinayak; Jin, Jonghoon; Dundar, Aysegul; Martini, Berin; Culurciello, Eugenio

doi:10.1109/cvprw.2014.106

Cited by 259 publications

(121 citation statements)

References 13 publications

Supporting

Mentioning

116

Contrasting

Order By: Relevance

“…Our implementation running on the Tegra K1 or on the [6,7,21]. Results for system (dark) and differential power (light).…”

Section: Resultsmentioning

confidence: 99%

“…A growing number of researchers are proposing to address the recognition of actions and objects with brain-inspired algorithms featuring multi-stage feature detectors and classifiers which can be customized using machine learning [6,7,11]. These techniques, collectively known as deep learning, have recently achieved record-breaking results on highly challenging datasets using automatic (supervised or partially unsupervised) learning.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Accelerating real-time embedded scene labeling with convolutional networks

Cavigelli

Magno

Benini

2015

Proceedings of the 52nd Annual Design Automation Conference

View full text Add to dashboard Cite

Today there is a clear trend towards deploying advanced computer vision (CV) systems in a growing number of application scenarios with strong real-time and power constraints. Brain-inspired algorithms capable of achieving recordbreaking results combined with embedded vision systems are the best candidate for the future of CV and video systems due to their flexibility and high accuracy in the area of image understanding. In this paper, we present an optimized convolutional network implementation suitable for real-time scene labeling on embedded platforms. We show that our algorithm can achieve up to 96 GOp/s, running on the Nvidia Tegra K1 embedded SoC. We present experimental results, compare them to the state-of-the-art, and demonstrate that for scene labeling our approach achieves a 1.5x improvement in throughput when compared to a modern desktop CPU at a power budget of only 11 W.

show abstract

“…Our implementation running on the Tegra K1 or on the [6,7,21]. Results for system (dark) and differential power (light).…”

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Accelerating real-time embedded scene labeling with convolutional networks

Cavigelli

Magno

Benini

2015

Proceedings of the 52nd Annual Design Automation Conference

View full text Add to dashboard Cite

show abstract

“…One example of previous work that implement weight stationary dataflow is nn-X, or neuFlow [85], which uses eight 2-D convolution engines for processing a 10×10 filter. There are total 100 MAC units, i.e.…”

Section: B Energy-efficient Dataflow For Acceleratorsmentioning

confidence: 99%

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

et al. 2017

View full text Add to dashboard Cite

Abstract-Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of DNNs to improve energy efficiency and throughput without sacrificing application accuracy or increasing hardware cost are critical to the wide deployment of DNNs in AI systems.This article aims to provide a comprehensive tutorial and survey about the recent advances towards the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various hardware platforms and architectures that support DNNs, and highlight key trends in reducing the computation cost of DNNs either solely via hardware design changes or via joint hardware design and DNN algorithm changes. It will also summarize various development resources that enable researchers and practitioners to quickly get started in this field, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic co-designs, being proposed in academia and industry.The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand the trade-offs between various hardware architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand recent implementation trends and opportunities.

show abstract

“…fpgaConvNet provides support for fixed-point as well as single-and doubleprecision floating-point representation. In the evaluation phase, Q8.8 fixed-point representation was used which is also used in the FPGA works that we compare with and has been extensively tested in the literature to give similar results to neural networks implemented in 32-bit floating-point [6].…”

Section: Discussionmentioning

confidence: 99%

“…By targeting the larger Xilinx Virtex-6 VLX240T FPGA, NeuFlow achieved 147 GOp/s at 10W. Finally, in 2014, the design was ported to Xilinx Zynq XC7045 SoC under the name nn-X [6] where it achieved 200 GOp/s at 4W. Nevertheless, systolic implementations suffer from complex routing logic and can support convolutions only up to the maximum implemented kernel size, e.g.…”

Section: Related Workmentioning

confidence: 99%

fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs

Venieris

Bouganis

2016

2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

212

103

View full text Add to dashboard Cite

Abstract-Convolutional Neural Networks (ConvNets) are a powerful Deep Learning model, providing state-of-the-art accuracy to many emerging classification problems. However, ConvNet classification is a computationally heavy task, suffering from rapid complexity scaling. This paper presents fpgaConvNet, a novel domain-specific modelling framework together with an automated design methodology for the mapping of ConvNets onto reconfigurable FPGA-based platforms. By interpreting ConvNet classification as a streaming application, the proposed framework employs the Synchronous Dataflow (SDF) model of computation as its basis and proposes a set of transformations on the SDF graph that explore the performance-resource design space, while taking into account platform-specific resource constraints. A comparison with existing ConvNet FPGA works shows that the proposed fully-automated methodology yields hardware designs that improve the performance density by up to 1.62× and reach up to 90.75% of the raw performance of architectures that are hand-tuned for particular ConvNets.

show abstract

A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks

Cited by 259 publications

References 13 publications

Accelerating real-time embedded scene labeling with convolutional networks

Accelerating real-time embedded scene labeling with convolutional networks

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs

Contact Info

Product

Resources

About