2021 Symposium on VLSI Circuits 2021
DOI: 10.23919/vlsicircuits52068.2021.9492347
|View full text |Cite
|
Sign up to set email alerts
|

CHIMERA: A 0.92 TOPS, 2.2 TOPS/W Edge AI Accelerator with 2 MByte On-Chip Foundry Resistive RAM for Efficient Training and Inference

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 37 publications
(19 citation statements)
references
References 0 publications
0
12
0
Order By: Relevance
“…The overhead of inserting our annotations for dFC can be small compared to what designers already insert to optimize the design. For ISmartDNN, for example, the total number of annotations is 304, which is 2.8% of the (5) 3 / 3 / 0 mean128 [33] (202k) (5) timeout 5 / 5 / 0 mean64 [33] (104k) (5) timeout 3 / 3 / 0 mean32 [33] (54k) (5) 1 / 1 / 0 dnn [57] (2M) (11) timeout 5 / 5 / 0 keypair [58] (>200M) (1) timeout 21 / 21 / 0 gsm [59] (8.8k) (1) timeout 7 / 7 / 0 nv_large [12] (16M) (23) timeout No RB bugs expected nv_small [12] (1M) (23) timeout HLSCNN [60] (323k) ( I. A-QED 2 RB checks are performed on all sub-accelerators regardless of batch size, so P is omitted compared to Table I.…”
Section: Resultsmentioning
confidence: 99%
“…The overhead of inserting our annotations for dFC can be small compared to what designers already insert to optimize the design. For ISmartDNN, for example, the total number of annotations is 304, which is 2.8% of the (5) 3 / 3 / 0 mean128 [33] (202k) (5) timeout 5 / 5 / 0 mean64 [33] (104k) (5) timeout 3 / 3 / 0 mean32 [33] (54k) (5) 1 / 1 / 0 dnn [57] (2M) (11) timeout 5 / 5 / 0 keypair [58] (>200M) (1) timeout 21 / 21 / 0 gsm [59] (8.8k) (1) timeout 7 / 7 / 0 nv_large [12] (16M) (23) timeout No RB bugs expected nv_small [12] (1M) (23) timeout HLSCNN [60] (323k) ( I. A-QED 2 RB checks are performed on all sub-accelerators regardless of batch size, so P is omitted compared to Table I.…”
Section: Resultsmentioning
confidence: 99%
“…The latency and throughput of EfficientNetV2-S are the best with 1.70 ms and 588.2 Inferences/s, respectively. The energy efficiency of 128.1 Inferences/s/W is comparable to the onchip resistive RAM implementation of 132.3 Inferences/s/W [38], whereas throughput is 202× higher.…”
Section: Implementation and Measurementsmentioning
confidence: 95%
“…Better algorithms [24] [2] are designed to more efficiently use resources, improve model performance, and optimize the deployment in a real-world environment. Also, ultra-low-power AI chips [10] and accelerators [15] have been proposed to support always-on ML capability for an extended period by a battery. However, a joint design of hardware and algorithm [30] [13] is required to squeeze the performance since TinyML delivers ML solutions to constrained devices with limited resources.…”
Section: Related Workmentioning
confidence: 99%