2017 IEEE International Solid-State Circuits Conference (ISSCC) 2017
DOI: 10.1109/isscc.2017.7870353
|View full text |Cite
|
Sign up to set email alerts
|

14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI

Abstract: ConvNets, or Convolutional Neural Networks (CNN), are state-of-the-art classification algorithms, achieving near-human performance in visual recognition [1]. New trends such as augmented reality demand always-on visual processing in wearable devices. Yet, advanced ConvNets achieving high recognition rates are too expensive in terms of energy as they require substantial data movement and billions of convolution computations. Today, state-of-the-art mobile GPU's and ConvNet accelerator ASICs [2][3] only demonstr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
187
0
1

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 333 publications
(201 citation statements)
references
References 6 publications
1
187
0
1
Order By: Relevance
“…3b), which is often used for a output-stationary dataflow, relies on another [29][30][31][32]: iacts are reused vertically and psums are accumulated horizontally. (b) Temporal accumulation array [15][16][17]: iacts are reused vertically and weights are reused horizontally. set of data dimensions to achieve high compute parallelism.…”
Section: A Challenges For Compact Dnnsmentioning
confidence: 99%
See 1 more Smart Citation
“…3b), which is often used for a output-stationary dataflow, relies on another [29][30][31][32]: iacts are reused vertically and psums are accumulated horizontally. (b) Temporal accumulation array [15][16][17]: iacts are reused vertically and weights are reused horizontally. set of data dimensions to achieve high compute parallelism.…”
Section: A Challenges For Compact Dnnsmentioning
confidence: 99%
“…While these approaches provide theoretical reductions in the size and number of operations and storage cost, specialized hardware is often necessary to translate these theoretical benefits into measurable improvements in energy efficiency and processing speed. Support for reduced precision has been demonstrated in recent hardware implementations, including Envision [15], Thinker [16], UNPU [17], Loom [18], and Stripes [19]. These works have shown various methods that efficiently translate reduced bitwidth from 16-bits down to 1-bit into both energy savings and increase in processing speed.…”
Section: Introductionmentioning
confidence: 99%
“…Mimicking human brain's architecture and computational primitives to build intelligent information processing systems is the key goal of neuromorphic engineering research activities worldwide. While there have been several demonstrations of neuromorphic computational platforms using standard complementary metal-oxide semiconductor (CMOS) technology [11][12][13][14][15][16][17][18][19][20] and demonstrations of nanoscale devices to mimic neuronal and synaptic dynamics [21][22][23][24][25][26][27][28][29][30] , none of these have achieved the target energy efficiency specifications necessary to build systems that can learn in real-time and in-the-field 31 .…”
Section: Introductionmentioning
confidence: 99%
“…Fig. 1 shows embedded GPU's and general Neural Processing Units (NPU) [2,3,4,5] consume too much energy for this task, both in their computations and in their off-chip DRAM access, while classical machine learning methods [6] are either inaccurate or lack the flexibility to cover multiple tasks. This paper introduces BinarEye: a CNN processor optimized for BinaryNets [1]: Neural Networks with weights and activations constrained to +1/-1.…”
Section: Introductionmentioning
confidence: 99%