The key challenge in RBGT tracking is how to fuse dual-modality information to build a robust RGB-T tracker. Motivated by CNN structure for local features, and visual transformer structure for global representations, the authors propose a two-stream hybrid structure, termed CMC 2 R, to take advantage of convolutional operations and self-attention mechanisms to lean the enhanced representation. CMC 2 R fuses local features and global representations under different resolutions through the transformer layer of the encoder block, and the two modalities are collaborated to get contextual information by the spatial and channel self-attention. The temporal association is performed with the track query, each track query models the entire track of an object, and updated frame-by-frame to build the long-range temporal relation. Experimental results show the effectiveness of the proposed method, and achieve the SOTAs performance.
The attention mechanism has produced impressive results in object tracking, but for a good trade‐off between performance and efficiency, CNN‐based approaches still dominate, owing to quadratic complexity of attention. Here, the SGF module is proposed, an efficient feature fusion block for effective object tracking with reduced linear complexity of attention. The proposed method fuses feature with attention in a coarse‐to‐fine manner. In the low‐resolution semantic branch, the top K regions with highest attention scores are selected; in the high‐resolution detail branch, attention is only calculated within regions corresponding to the top K regions. Thus, the features from the high‐resolution branch can be efficiently fused under the guidance of low‐resolution branch. Experiments on RGB and RGB‐T datasets with reformed FairMOT and MDNet+RGBT trackers demonstrated the effectiveness of the proposed method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.