Proceedings of the 50th Annual International Symposium on Computer Architecture 2023
DOI: 10.1145/3579371.3589057
|View full text |Cite
|
Sign up to set email alerts
|

FACT: FFN-Attention Co-optimized Transformer Architecture with Eager Correlation Prediction

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(3 citation statements)
references
References 20 publications
0
3
0
Order By: Relevance
“…Some work [17,24,32,34,44,45] lay more emphasis on accelerating sparse attention. They design specialized architectures to fully utilize the pre-defined static attention pattern [17,32] or dynamically generated attention pattern [34,37,44,45]. Recently, FACT [37] points out the importance of compressing linear layers with mixed-precision quantization to help reduce latency.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Some work [17,24,32,34,44,45] lay more emphasis on accelerating sparse attention. They design specialized architectures to fully utilize the pre-defined static attention pattern [17,32] or dynamically generated attention pattern [34,37,44,45]. Recently, FACT [37] points out the importance of compressing linear layers with mixed-precision quantization to help reduce latency.…”
Section: Related Workmentioning
confidence: 99%
“…They design specialized architectures to fully utilize the pre-defined static attention pattern [17,32] or dynamically generated attention pattern [34,37,44,45]. Recently, FACT [37] points out the importance of compressing linear layers with mixed-precision quantization to help reduce latency. However, these methods cannot accelerate the decode stage of LLMs since they mainly focus on the prefill stage for discriminative models, like medium-sized BERT [16] model.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation