2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01323
|View full text |Cite
|
Sign up to set email alerts
|

TubeR: Tubelet Transformer for Video Action Detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
32
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 33 publications
(32 citation statements)
references
References 28 publications
0
32
0
Order By: Relevance
“…Recent works, such as [63,72], highlight the effectiveness of Transformer-based approaches for the task of detecting spatio-temporal tubes in videos. In particular, Tu-beR [72] proposed an end-to-end approach using no proposals or person detectors.…”
Section: Related Workmentioning
confidence: 99%
“…Recent works, such as [63,72], highlight the effectiveness of Transformer-based approaches for the task of detecting spatio-temporal tubes in videos. In particular, Tu-beR [72] proposed an end-to-end approach using no proposals or person detectors.…”
Section: Related Workmentioning
confidence: 99%
“…Zhao et al. [18] propose an end‐to‐end action detection framework, which can be optimised for modelling action tubes with variable lengths and aspect ratios.…”
Section: Related Workmentioning
confidence: 99%
“…Action Detection is a more challenging problem [20,90,69] compared to action recognition [67,6] problem due to the additional requirement for localisation of actions in a large spatial-temporal search space. Supervised action detection methods [81,69,34,44,90,56] has made large strides thanks to large scale datasets like UCF24 [73], AVA [26] and MultiSports [41]. Most of current approaches follow key-frame based approach popularised by SlowFast [20].…”
Section: Related Workmentioning
confidence: 99%
“…There has been more sophisticated approaches, e.g. based on actor-context modelling [10,56], on long-term feature banks [82,74], and on transformer heads [90,45]. We will make use of key-frame based SlowFast [20] network as our default action detector because of it's simplicity, competitive performance, and reproducible code base provided on pySlowFast [19], which can be easily extended to include transformer architectures, such as MViTv2 [45].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation