2020
DOI: 10.1109/tc.2019.2941875
|View full text |Cite
|
Sign up to set email alerts
|

Fast and Efficient Convolutional Accelerator for Edge Computing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
36
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 42 publications
(36 citation statements)
references
References 34 publications
0
36
0
Order By: Relevance
“…During the past few years, common trends in machine learning accelerator design have included: providing higher memory bandwidth [22], efficient dataflow mapping [23], in-memory computing [24], and, skipping ineffectual computations [25].…”
Section: A Design Of Efficient Hardware Acceleratorsmentioning
confidence: 99%
See 3 more Smart Citations
“…During the past few years, common trends in machine learning accelerator design have included: providing higher memory bandwidth [22], efficient dataflow mapping [23], in-memory computing [24], and, skipping ineffectual computations [25].…”
Section: A Design Of Efficient Hardware Acceleratorsmentioning
confidence: 99%
“…However, as DNN grow (recently, models with hundreds of billions of parameters have been developed [26]), offchip DRAM, despite long access latency and high energy consumption, becomes indispensable. To address this, recent work incorporates a local buffer for each PE with a global buffer shared by all PEs, enabling fast, energy-efficient data accesses, as such buffers can consume up to two orders of magnitudes less energy per access than DRAM [21], [25], [27]- [35].…”
Section: A Design Of Efficient Hardware Acceleratorsmentioning
confidence: 99%
See 2 more Smart Citations
“…Among them, representative dataflow techniques are the weight-stationary (WS) [9], [10], output-stationary (OS) [11], [12], row-stationary (RS) [13], [14] and no local reuse (NLR) [15], [16], [17] dataflow techniques. However, they fail to employ the full potential performance of their architectures due to the limited data bandwidth of devices [32]. The bandwidth bottleneck prevent their architectures from providing the required parallelism for their PEs immediately after each access to the off-chip memory.…”
Section: Introductionmentioning
confidence: 99%