2020
DOI: 10.48550/arxiv.2012.15460
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

TransTrack: Multiple Object Tracking with Transformer

Abstract: Multiple-object tracking(MOT) is mostly dominated by complex and multi-step tracking-by-detection algorithm, which performs object detection, feature extraction and temporal association, separately. Query-key mechanism in single-object tracking(SOT), which tracks the object of the current frame by object feature of the previous frame, has great potential to set up a simple joint-detectionand-tracking MOT paradigm. Nonetheless, the query-key method is seldom studied due to its inability to detect newcoming obje… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
200
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 109 publications
(201 citation statements)
references
References 60 publications
1
200
0
Order By: Relevance
“…The elegance of ViT [23] has also motivated similar model designs with simpler global operators such as MLP-Mixer [85], gMLP [53], GFNet [74], and FNet [43], to name a few. Despite successful applications to many high-level tasks [4,23,56,83,87,100], the efficacy of these global models on low-level enhancement and restoration problems has not been studied extensively. The pioneering works on Transformers for lowlevel vision [9,14] directly applied full self-attention, which only accepts relatively small patches of fixed sizes (e.g., 48×48).…”
Section: Enhancementmentioning
confidence: 99%
“…The elegance of ViT [23] has also motivated similar model designs with simpler global operators such as MLP-Mixer [85], gMLP [53], GFNet [74], and FNet [43], to name a few. Despite successful applications to many high-level tasks [4,23,56,83,87,100], the efficacy of these global models on low-level enhancement and restoration problems has not been studied extensively. The pioneering works on Transformers for lowlevel vision [9,14] directly applied full self-attention, which only accepts relatively small patches of fixed sizes (e.g., 48×48).…”
Section: Enhancementmentioning
confidence: 99%
“…Track-RCNN [22] and FairMOT [23] further add a Re-ID branch on top of object detector in a joint training framework, incorporating object detection and Re-ID feature learning. Based on DETR, TransTrack [9] and TrackFormer [24] develop the transformer-based frameworks for MOT.…”
Section: Related Workmentioning
confidence: 99%
“…Based on transformer-based methods [9], and taking the heterogenicity of different modalities, our CMC2R is a fully endto-end framework, which fuses the information collaboratively using the two-stream structure and the transformer structure, and the detection and tracking is trained jointly. Secondly, The NMS is not needed for tracks association, and the temporal passing module combined with multi-frame tracking feature is proposed to model the temporal relation.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations