2018
DOI: 10.1109/tcsi.2017.2735490
|View full text |Cite
|
Sign up to set email alerts
|

A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things

Abstract: Abstract-Convolutional neural network (CNN) offers significant accuracy in image detection. To implement imagedetection using CNN in the internet of things (IoT) devices, a streaming hardware accelerator is proposed. The proposed accelerator optimizes the energy efficiency by avoiding unnecessary data movement. With unique filter decomposition technique, the accelerator can support arbitrary convolution window size. In addition, max pooling function can be computed in parallel with convolution by using separat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
137
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 179 publications
(141 citation statements)
references
References 14 publications
0
137
0
1
Order By: Relevance
“…• User identification ⇒ DNN-based algorithm • Tracking accuracy improvement [125] • IoT devices management ⇒ SNN-based algorithm • Image detection [126] • Errors in collected data • Data relationship extraction ⇒ RNN-based RL algorithm • Data sampling [127] • Real-time training for ANNs • Modeling autonomous M2M communication ⇒ FNN and SNN based algorithm • Entity state prediction [128] • Target surveillance [129] environment. For example, a user displays only the visible portion of a 360 • video and, hence, transmitting the entire 360 • video frame can waste the capacity-limited bandwidth.…”
Section: Iotmentioning
confidence: 99%
“…• User identification ⇒ DNN-based algorithm • Tracking accuracy improvement [125] • IoT devices management ⇒ SNN-based algorithm • Image detection [126] • Errors in collected data • Data relationship extraction ⇒ RNN-based RL algorithm • Data sampling [127] • Real-time training for ANNs • Modeling autonomous M2M communication ⇒ FNN and SNN based algorithm • Entity state prediction [128] • Target surveillance [129] environment. For example, a user displays only the visible portion of a 360 • video and, hence, transmitting the entire 360 • video frame can waste the capacity-limited bandwidth.…”
Section: Iotmentioning
confidence: 99%
“…Finally, Table III compares similar accelerators, showing the process, the maximum frequency, the logic area, the amount of on-chip memory, the bit-width for inputs, and the GOPS (for reported frequency and normalized to 1 GHz). In terms of gate count and performance, Lupulus achieves the highest peak performance among the considered accelerators, at a 32 % and 39 % lower gate count compared to [5] and [8], respectively. However, we use 14 % more gates than [20] with only a 4 % improvement in GOPS/GHz.…”
Section: Iv-b Benchmarksmentioning
confidence: 99%
“…Relation to Previous Work: Many existing accelerators, such as [5]- [8] have small local memories in the PEs for the weights, inputs, and partial sums, which may leave the local memories for the partial sums underutilized when partial sums are forwarded to a neighboring PE instead of being stored in the PE itself. Moreover, the partial sums may have to be read out to a high-level memory and then sent back later if the local memories are too small.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…EEP learning using convolutional and fully connected neural networks has achieved unprecedented accuracy on many modern artificial intelligence (AI) applications, such as image, voice, and DNA pattern detection and recognition [1][2][3][4][5][6][7][8][9][10]. However, one of the major problems that hinder its commercial feasibility is that neural networks require great computational resources and memory bandwidth even for very simple tasks.…”
Section: Introductionmentioning
confidence: 99%