2022
DOI: 10.48550/arxiv.2201.05991
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Video Transformers: A Survey

Abstract: Transformer models have shown great success modeling long-range interactions. Nevertheless, they scale quadratically with input length and lack inductive biases. These limitations can be further exacerbated when dealing with the high dimensionality of video. Proper modeling of video, which can span from seconds to hours, requires handling long-range interactions. This makes Transformers a promising tool for solving video related tasks, but some adaptations are required. While there are previous works that stud… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(9 citation statements)
references
References 125 publications
0
9
0
Order By: Relevance
“…Before being introduced into the field of time series prediction, Transformer has shown dominant performance in NLP and CV community [Vaswani et al, 2017;Kenton and Toutanova, 2019;Han et al, 2021;Han et al, 2020;Khan et al, 2021;Selva et al, 2022]. One of the key advantages Transformer holds in these fields is being able to increase prediction power through increasing model size.…”
Section: Model Size Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…Before being introduced into the field of time series prediction, Transformer has shown dominant performance in NLP and CV community [Vaswani et al, 2017;Kenton and Toutanova, 2019;Han et al, 2021;Han et al, 2020;Khan et al, 2021;Selva et al, 2022]. One of the key advantages Transformer holds in these fields is being able to increase prediction power through increasing model size.…”
Section: Model Size Analysismentioning
confidence: 99%
“…Over the past few years, numerous Transformer variants have been proposed to advance the state-of-the-art performances of various tasks significantly. There are quite a few literature reviews from different aspects, such as in NLP applications Han et al, 2021], CV applica-1 https://github.com/qingsongedu/time-series-transformers-review tions [Han et al, 2020;Khan et al, 2021;Selva et al, 2022], efficient Transformers [Tay et al, 2020], and attention models [Chaudhari et al, 2021;Galassi et al, 2020].…”
Section: Introductionmentioning
confidence: 99%
“…The scale of used data is much larger than traditional methods, but it's still limited. The pursuit Vision-Language Intelligence: Tasks, Representation Learning, and Large Models [38] 2022 arXiv MM DC, 19 A survey on vision transformer [39] 2022 TPAMI CV DC, 23 Transformers in vision: A survey [40] 2021 CSUR CV SC, 38 A Survey of Visual Transformers [41] 2021 arXiv CV DC, 21 Video Transformers: A Survey [42] 2022 arXiv CV DC, 24 Threats to Pre-trained Language Models: Survey and Taxonomy [43] 2022 arXiv NLP DC, 8 A survey on bias in deep NLP [44] 2021 AS NLP SC, 26…”
Section: Conventional Deep Learningmentioning
confidence: 99%
“…With the plethora of recent vision methods that rely on the attention mechanism and the transformer architecture, many works have emerged that survey these methods. Some of these works consider transformers in vision in general [14], [15], [16], [17], [18], while others focus on a specific aspect, such as efficiency [19], or a specific application, such as video [20] or medical imaging [21]. Considering the differences between the 2D and 3D data representation and processing, special attention to transformers applied to 3D vision applications is essential.…”
Section: Introductionmentioning
confidence: 99%