2022 IEEE International Solid- State Circuits Conference (ISSCC) 2022
DOI: 10.1109/isscc42614.2022.9731686
|View full text |Cite
|
Sign up to set email alerts
|

A 28nm 27.5TOPS/W Approximate-Computing-Based Transformer Processor with Asymptotic Sparsity Speculating and Out-of-Order Computing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 30 publications
(5 citation statements)
references
References 2 publications
0
5
0
Order By: Relevance
“…With the smallest PE numbers, our design did not have the highest throughput and the corresponding energy efficiency, since our throughput target is to meet the required real-time constraints. Existing transformer-based designs only optimize transformer attention execution by exploiting the sparsity of attention [17]- [19], [21] instead of the whole model as in this work. In addition, our design must optimize for CNN, transformer, and GRU at the same time, which is not addressed in previous designs.…”
Section: Hardware Implementation Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…With the smallest PE numbers, our design did not have the highest throughput and the corresponding energy efficiency, since our throughput target is to meet the required real-time constraints. Existing transformer-based designs only optimize transformer attention execution by exploiting the sparsity of attention [17]- [19], [21] instead of the whole model as in this work. In addition, our design must optimize for CNN, transformer, and GRU at the same time, which is not addressed in previous designs.…”
Section: Hardware Implementation Resultsmentioning
confidence: 99%
“…This design utilizes a systolic array for swift self-attention computation and extends native support for both LN and softmax operations. On another note, [21] put forth a transformer processor designed to bypass weakly related tokens, targeting enhanced energy efficiency. However, this approach introduces an irregular and intricate computing structure.…”
Section: Deep Learning Acceleratorsmentioning
confidence: 99%
“…In this paper, the T si of the last four stages are set as 0, while the T si of the remaining stages are set as 0.1. Finally, the bit-width is set as [10,10,11,12,12,13,14,15,16]. We utilize the frame-length adaptive MFCC structure, which is proposed in our previous work [27], and the architecture is shown in Fig.…”
Section: Stage-by-stage Bit-width Selection Algorithmmentioning
confidence: 99%
“…Since adjacent frames share similar information, efficiently leverage video temporal correlations to minimize the computing costs for video model is worth exploring. In ISSCC'20, Yuan [8] proposes an inter-frame data-reuse processors for video accelerating. Other than directly inputting the original frames, the work processes the difference feature between two frames in each CNN layer to reduce the redundant computation.…”
Section: Ai Chips For Image or Video Processingmentioning
confidence: 99%