2023
DOI: 10.1145/3586074
|View full text |Cite
|
Sign up to set email alerts
|

A Practical Survey on Faster and Lighter Transformers

Abstract: Recurrent neural networks are effective models to process sequences. However, they are unable to learn long-term dependencies because of their inherent sequential nature. As a solution, Vaswani et al. introduced the Transformer, a model solely based on the attention mechanism that is able to relate any two positions of the input sequence, hence modelling arbitrary long dependencies. The Transformer has improved the state-of-the-art across numerous sequence modelling tasks. However, its effectiveness comes at t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
13
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 25 publications
(13 citation statements)
references
References 29 publications
0
13
0
Order By: Relevance
“…Additional research efforts, like the Pythia suite, are providing new tools to analyze LLMs and address this issue [66]. Other recent survey papers such as [67], seek to address the issue of how to apply more efficient transformer methods to NLP tasks. Approaches are grouped together by sparse, factorized attention, and architectural change.…”
Section: How Large?mentioning
confidence: 99%
See 1 more Smart Citation
“…Additional research efforts, like the Pythia suite, are providing new tools to analyze LLMs and address this issue [66]. Other recent survey papers such as [67], seek to address the issue of how to apply more efficient transformer methods to NLP tasks. Approaches are grouped together by sparse, factorized attention, and architectural change.…”
Section: How Large?mentioning
confidence: 99%
“…Approaches are grouped together by sparse, factorized attention, and architectural change. However, [67] concludes there are, ". .…”
Section: How Large?mentioning
confidence: 99%
“…This model has enabled researchers to approach textual data with novel methods and has become increasingly popular over time due to its effectiveness in acquiring contextual word representations, leading to numerous studies in this area. Upon reviewing the literature, it is evident that many studies typically focus on various aspects of transformer models, including their architecture, efficiency, computational power, memory efficiency, and the development of fast and lightweight variants [52]. On the other hand, in other studies, various NLP applications have been explored, including visualization of transformers for NLP [53], examination of pre-training methods used in transformer models [54], usage of transformers for text summarization tasks [55], application of transformer models for detecting different sentiment levels from text-based data [56], and using transformers for extracting useful information from large datasets [57].…”
Section: Deep Learning and Transformersmentioning
confidence: 99%
“…Transformers (Vaswani et al, 2017) have emerged as highly effective models for various tasks, but their widespread adoption has been limited by the quadratic cost of the self-attention mechanism and poor performance on long-range tasks. Researchers have pursued diverse approaches to overcome this challenge and to create efficient transformer architectures (Fournier et al, 2021;. From the perspective of efficiency, techniques such as sparse attention , low-rank attention (Wang et al, 2020;Winata et al, 2020), kernel-based attention (Choromanski et al, 2020), recurrent mechanisms (Hutchins et al, 2022;Dai et al, 2019), and efficient IO-awareness-based implementation (Dao et al, 2022a) proved efficient.…”
Section: Long Range Transformersmentioning
confidence: 99%