Video Transformers: A Survey

Selva, Javier; Johansen, Anders; Escalera, Sérgio; Nasrollahi, Kamal; Moeslund, Thomas B.; Clapés, Albert

doi:10.48550/arxiv.2201.05991

Cited by 7 publications

(9 citation statements)

References 125 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Before being introduced into the field of time series prediction, Transformer has shown dominant performance in NLP and CV community [Vaswani et al, 2017;Kenton and Toutanova, 2019;Han et al, 2021;Han et al, 2020;Khan et al, 2021;Selva et al, 2022]. One of the key advantages Transformer holds in these fields is being able to increase prediction power through increasing model size.…”

Section: Model Size Analysismentioning

confidence: 99%

“…Over the past few years, numerous Transformer variants have been proposed to advance the state-of-the-art performances of various tasks significantly. There are quite a few literature reviews from different aspects, such as in NLP applications Han et al, 2021], CV applica-1 https://github.com/qingsongedu/time-series-transformers-review tions [Han et al, 2020;Khan et al, 2021;Selva et al, 2022], efficient Transformers [Tay et al, 2020], and attention models [Chaudhari et al, 2021;Galassi et al, 2020].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Transformers in Time Series: A Survey

Wen¹,

Tian²,

Zhang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Transformers have achieved superior performances in many tasks in natural language processing and computer vision, which also triggered great interests in the time series community. Among multiple advantages of transformers, the ability to capture long-range dependencies and interactions is especially attractive for time series modeling, leading to exciting progress in various time series applications. In this paper, we systematically review transformer schemes for time series modeling by highlighting their strengths as well as limitations.In particular, we examine the development of time series transformers in two perspectives. From the perspective of network structure, we summarize the adaptations and modification that have been made to transformer in order to accommodate the challenges in time series analysis. From the perspective of applications, we categorize time series transformers based on common tasks including forecasting, anomaly detection, and classification. Empirically, we perform robust analysis, model size analysis, and seasonal-trend decomposition analysis to study how transformers perform in time series. Finally, we discuss and suggest future directions to provide useful research guidance. A corresponding resource list which will be continuously updated can be found in the GitHub repository 1 .

show abstract

Section: Model Size Analysismentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Transformers in Time Series: A Survey

Wen¹,

Tian²,

Zhang³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The scale of used data is much larger than traditional methods, but it's still limited. The pursuit Vision-Language Intelligence: Tasks, Representation Learning, and Large Models [38] 2022 arXiv MM DC, 19 A survey on vision transformer [39] 2022 TPAMI CV DC, 23 Transformers in vision: A survey [40] 2021 CSUR CV SC, 38 A Survey of Visual Transformers [41] 2021 arXiv CV DC, 21 Video Transformers: A Survey [42] 2022 arXiv CV DC, 24 Threats to Pre-trained Language Models: Survey and Taxonomy [43] 2022 arXiv NLP DC, 8 A survey on bias in deep NLP [44] 2021 AS NLP SC, 26…”

Section: Conventional Deep Learningmentioning

confidence: 99%

Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

Wang¹,

Chen²,

Qian³

et al. 2023

Preprint

View full text Add to dashboard Cite

With the urgent demand for generalized deep models, many pre-trained big models are proposed, such as BERT, ViT, GPT, etc. Inspired by the success of these models in single domains (like computer vision and natural language processing), the multi-modal pre-trained big models have also drawn more and more attention in recent years. In this work, we give a comprehensive survey of these models and hope this paper could provide new insights and helps fresh researchers to track the most cuttingedge works. Specifically, we firstly introduce the background of multi-modal pre-training by reviewing the conventional deep learning, pre-training works in natural language process, computer vision, and speech. Then, we introduce the task definition, key challenges, and advantages of multi-modal pretraining models (MM-PTMs), and discuss the MM-PTMs with a focus on data, objectives, network architectures, and knowledge enhanced pre-training. After that, we introduce the downstream tasks used for the validation of large-scale MM-PTMs, including generative, classification, and regression tasks. We also give visualization and analysis of the model parameters and results on representative downstream tasks. Finally, we point out possible research directions for this topic that may benefit future works. In addition, we maintain a continuously updated paper list for large-scale pre-trained multi-modal big models: https://github.com/wangxiao5791509/MultiModal BigModels Survey.

show abstract

“…With the plethora of recent vision methods that rely on the attention mechanism and the transformer architecture, many works have emerged that survey these methods. Some of these works consider transformers in vision in general [14], [15], [16], [17], [18], while others focus on a specific aspect, such as efficiency [19], or a specific application, such as video [20] or medical imaging [21]. Considering the differences between the 2D and 3D data representation and processing, special attention to transformers applied to 3D vision applications is essential.…”

Section: Introductionmentioning

confidence: 99%

3D Vision with Transformers: A Survey

Lahoud¹,

Cao²,

Khan³

et al. 2022

Preprint

View full text Add to dashboard Cite

The success of the transformer architecture in natural language processing has recently triggered attention in the computer vision field. The transformer has been used as a replacement for the widely used convolution operators, due to its ability to learn long-range dependencies. This replacement was proven to be successful in numerous tasks, in which several state-of-the-art methods rely on transformers for better learning. In computer vision, the 3D field has also witnessed an increase in employing the transformer for 3D convolution neural networks and multi-layer perceptron networks. Although a number of surveys have focused on transformers in vision in general, 3D vision requires special attention due to the difference in data representation and processing when compared to 2D vision. In this work, we present a systematic and thorough review of more than 100 transformers methods for different 3D vision tasks, including classification, segmentation, detection, completion, pose estimation, and others. We discuss transformer design in 3D vision, which allows it to process data with various 3D representations. For each application, we highlight key properties and contributions of proposed transformer-based methods. To assess the competitiveness of these methods, we compare their performance to common non-transformer methods on 12 3D benchmarks. We conclude the survey by discussing different open directions and challenges for transformers in 3D vision. In addition to the presented papers, we aim to frequently update the latest relevant papers along with their corresponding implementations at: https://github.com/lahoud/3d-vision-transformers.

show abstract

Video Transformers: A Survey

Cited by 7 publications

References 125 publications

Transformers in Time Series: A Survey

Transformers in Time Series: A Survey

Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

3D Vision with Transformers: A Survey

Contact Info

Product

Resources

About