2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.01028
|View full text |Cite
|
Sign up to set email alerts
|

Learning Spatio-Temporal Transformer for Visual Tracking

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
317
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 566 publications
(398 citation statements)
references
References 42 publications
1
317
0
Order By: Relevance
“…Thus, it becomes the winner of VOT-RT2021. In the real-time track, TransT-M performs 1.9% higher than the second best tracker STARK [28], which is also a transformerbased tracker. The sequence of VOT is difficult, it has many appearance changes and similar target interference.…”
Section: Evaluation On Votmentioning
confidence: 99%
See 2 more Smart Citations
“…Thus, it becomes the winner of VOT-RT2021. In the real-time track, TransT-M performs 1.9% higher than the second best tracker STARK [28], which is also a transformerbased tracker. The sequence of VOT is difficult, it has many appearance changes and similar target interference.…”
Section: Evaluation On Votmentioning
confidence: 99%
“…At the same time, [27] also employed Transformer and combined it with SiameseRPN [4] and DiMP [12] as a feature enhancement module to improve the performance of the tracker rather than replace the correlation. Stark [28] proposes another transformer tracking framework by concatenating the search region and the template. It also employs the corner prediction head to improve the accuracy of the bounding box prediction and the dynamic template to fuse the temporal information.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Most work in tracking focus on discriminative tracking [82], [111], [112], [113] by employing a Transformer to spatially relate the tracked object to it surroundings, effectively leveraging the global attention to discriminate between tracked object and background. Since the Transformer relies on an accurate representation, the template feature used for discrimination is progressively updated with a moving average [111], [113]. Alternatively the Transformer can be used to attend objects which interact with the tracked object and use that to infer tracking and predict movements of occluded actors and/or objects [63].…”
Section: S23 Trackingmentioning
confidence: 99%
“…Recently, transformer [33] has been successfully applied in many vision tasks [9,22,3]. In the tracking field, transformer also boosts the performance [4,35,40]. However, transformer entails high seriality and its computational amount is proportional to the square of the number of input tokens.…”
Section: Introductionmentioning
confidence: 99%