2021 24th Euromicro Conference on Digital System Design (DSD) 2021
DOI: 10.1109/dsd53832.2021.00074
|View full text |Cite
|
Sign up to set email alerts
|

Improving the Efficiency of Transformers for Resource-Constrained Devices

Abstract: Transformers provide promising accuracy and have become popular and used in various domains such as natural language processing and computer vision. However, due to their massive number of model parameters, memory and computation requirements, they are not suitable for resource-constrained low-power devices. Even with high-performance and specialized devices, the memory bandwidth can become a performancelimiting bottleneck. In this paper, we present a performance analysis of state-of-the-art vision transformer… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 12 publications
(5 citation statements)
references
References 28 publications
0
5
0
Order By: Relevance
“…The controlled-NOT, or CNOT, gate is a two-qubit gate, see (5) and Figure 2, that facilitates multi-qubit entanglement [17]. In this work, we will provide a complexity (depth) analysis of the proposed circuits in terms of the critical path of consecutive single-qubit gates and two-qubit CNOT gates [17].…”
Section: Controlled-not (Cnot) Gatementioning
confidence: 99%
See 1 more Smart Citation
“…The controlled-NOT, or CNOT, gate is a two-qubit gate, see (5) and Figure 2, that facilitates multi-qubit entanglement [17]. In this work, we will provide a complexity (depth) analysis of the proposed circuits in terms of the critical path of consecutive single-qubit gates and two-qubit CNOT gates [17].…”
Section: Controlled-not (Cnot) Gatementioning
confidence: 99%
“…In the convolution layer, filters are applied to input data for specific applications, and the pooling layers reduce the spatial dimensions in the generated feature maps [3]. The reduced spatial dimensions generated from the pooling layers reduce memory requirements, which is a major concern for resource-constrained devices [4,5].…”
Section: Introductionmentioning
confidence: 99%
“…It relies on massive number of model parameters, excessive memory and computation requirements. [1], [27], [20]. Also representation learned by model on HAR and Epilepsy dataset is shown in Figure 3.…”
Section: Baselinesmentioning
confidence: 99%
“…However, constrained by the shared memory capacity, it can only outperform FasterTransformer in a very limited range and starts to severely degrade as the sequence length becomes long. Besides, Hamid et al [21] improves the efficiency of transformer inference for resource-constrained devices.…”
Section: Related Workmentioning
confidence: 99%