2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.00857
|View full text |Cite
|
Sign up to set email alerts
|

Global Tracking Transformers

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 93 publications
(40 citation statements)
references
References 40 publications
0
14
0
Order By: Relevance
“…4.1. Note that while AOA [14] has a better ClsA, it ensembles multiple few-shot detection and re-identification models trained on additional datasets as reported by previous works [36,81]. Overall, our approach surpasses the previous state-of-the-art by 1.4 points in TETA and 2.3 points in Track mAP while using a weaker backbone and the same detector.…”
Section: Comparison To State-of-the-artmentioning
confidence: 61%
See 3 more Smart Citations
“…4.1. Note that while AOA [14] has a better ClsA, it ensembles multiple few-shot detection and re-identification models trained on additional datasets as reported by previous works [36,81]. Overall, our approach surpasses the previous state-of-the-art by 1.4 points in TETA and 2.3 points in Track mAP while using a weaker backbone and the same detector.…”
Section: Comparison To State-of-the-artmentioning
confidence: 61%
“…For our ablation studies, we use the same 6 epoch fine-tuning as above. For data hallucination, we use the combined LVISv1 and COCO annotations as used in [10,18,81]. Note that for data hallucination, we only add objects with a bounding box area greater than 64 2 to A + .…”
Section: Experiments Detailsmentioning
confidence: 99%
See 2 more Smart Citations
“…It is capable of linking objects after a long time span, which is realized by storing the identity embeddings of the tracked objects in a large spatiotemporal memory, and by adaptively referencing and aggregating useful information from the memory as needed. Global Tracking Transformers (GTR) (Zhou et al, 2022)global is a global MOT network structure based on transformers, which uses them to encode all target features in the input video sequence and assigns the targets to different trajectories using trajectory queries.…”
Section: Vision Transformer-based Motmentioning
confidence: 99%