2021
DOI: 10.48550/arxiv.2112.00995
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

SwinTrack: A Simple and Strong Baseline for Transformer Tracking

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
61
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 26 publications
(61 citation statements)
references
References 0 publications
0
61
0
Order By: Relevance
“…For example, Liang et al presented SwinIR [20] for image restoration, which first proposed the convolutional layer to extract shallow features, and then adopted Swin transformer for deep feature extraction. Lin et al introduced SwinTrack [21] to interact with the target object and search region for tracking. However, few studies have developed transformer into image fusion fields.…”
Section: A Transformer In Vision Tasksmentioning
confidence: 99%
“…For example, Liang et al presented SwinIR [20] for image restoration, which first proposed the convolutional layer to extract shallow features, and then adopted Swin transformer for deep feature extraction. Lin et al introduced SwinTrack [21] to interact with the target object and search region for tracking. However, few studies have developed transformer into image fusion fields.…”
Section: A Transformer In Vision Tasksmentioning
confidence: 99%
“…Following the previous methods [51,12], we train the models on the train splits of four datasets GOT10k [36], TrackingNet [59], LaSOT [20], and COCO [52] and report the success score (SUC) for the TrackingNet dataset and LaSOT dataset, and the average overlap (AO) for GOT10k. We use the SwinTrack [51] to train and evaluate our pre-trained models with the same data augmentations, training, and inference settings. We sample 131072 pairs per epoch and train the models for 300 epochs.…”
Section: Geometric and Motion Tasksmentioning
confidence: 99%
“…For the video object tracking, MIM models also show a stronger transfer ability over supervised pretrained models. On the long-term dataset LaSOT, SwinTrack [51] with MIM pre-trained SwinV2-B backbone achieves comparable result with the SOTA MixFormer-L [12] with a larger image size 320 × 320. We obtain the best SUC of 70.7 on the LaSOT with SwinV2-L backbone with the input image size 224 × 224 and template size 112 × 112.…”
Section: Geometric and Motion Tasksmentioning
confidence: 99%
See 2 more Smart Citations