2022
DOI: 10.1049/ipr2.12427
|View full text |Cite
|
Sign up to set email alerts
|

CMC2R: Cross‐modal collaborative contextual representation for RGBT tracking

Abstract: The key challenge in RBGT tracking is how to fuse dual-modality information to build a robust RGB-T tracker. Motivated by CNN structure for local features, and visual transformer structure for global representations, the authors propose a two-stream hybrid structure, termed CMC 2 R, to take advantage of convolutional operations and self-attention mechanisms to lean the enhanced representation. CMC 2 R fuses local features and global representations under different resolutions through the transformer layer of t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(1 citation statement)
references
References 38 publications
0
1
0
Order By: Relevance
“…According to the published literature, the first to introduce Transformer into the field of visible and infrared images fusion target tracking is a two-stream hybrid structure called CMC2R(Cross-modal collaborative contextual representation) [25]. The local and global representations at different resolutions were fused through the encoder block transformation layer, and the two modalities cooperated to acquire contextual information through the spatial and channel self-attention.…”
Section: Methods Based On Transformermentioning
confidence: 99%
“…According to the published literature, the first to introduce Transformer into the field of visible and infrared images fusion target tracking is a two-stream hybrid structure called CMC2R(Cross-modal collaborative contextual representation) [25]. The local and global representations at different resolutions were fused through the encoder block transformation layer, and the two modalities cooperated to acquire contextual information through the spatial and channel self-attention.…”
Section: Methods Based On Transformermentioning
confidence: 99%