Exploring Fusion Strategies for Accurate RGBT Visual Object Tracking

Tang, Zhangyong; Xu, Tianyang; Li, Hui; Wu, Xiaojun; Zhu, Xuefeng; Kittler, Josef

doi:10.48550/arxiv.2201.08673

Cited by 2 publications

(3 citation statements)

References 62 publications

(117 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the VOT2020-RGBT challenge, the core metric to evaluate the tracking performance is EAO. The four trackers used in the comparative experiment are: AFAT [44] with single modality (RGB and infrared) as input, combined RFN and AFAT (RFNT) [15], M2C2Frgbt [40], and the decisionlevel fusion tracker (DFAT) [43].…”

Section: The Tracking Results On Vot2020-rgbtmentioning

confidence: 99%

See 1 more Smart Citation

RFN-Nest: An end-to-end residual fusion network for infrared and visible images

2021

View full text Add to dashboard Cite

Section: The Tracking Results On Vot2020-rgbtmentioning

confidence: 99%

“…In order to apply our fusion framework to the multimodal object tracking task, a state-of-the-art RGBT tracker (DFAT) [43] is adopted as the baseline tracker. The DFAT won the third place in the evaluation on the public dataset and was the winning tracker in the VOT2020-RGBT challenge.…”

Section: Experiments On Rgbt Object Tracking Taskmentioning

confidence: 99%

RFN-Nest: An end-to-end residual fusion network for infrared and visible images

2021

View full text Add to dashboard Cite

“…To employ the temporal continuity in a video sequence, the history information was integrated to obtain fusion features by computing the adaptive weights of previous frames [23]. Tang et al [24] proposed multiple fusion strategies from different perspectives (including pixel-level, feature-level and decision-level) to boost the performance of multi-modal object tracking in video.…”

Section: Feature Aggregation Methods For Rgb-t Object Trackingmentioning

confidence: 99%

SCA-MMA: Spatial and Channel-Aware Multi-Modal Adaptation for Robust RGB-T Object Tracking

et al. 2022

View full text Add to dashboard Cite

The RGB and thermal (RGB-T) object tracking task is challenging, especially with various target changes caused by deformation, abrupt motion, background clutter and occlusion. It is critical to employ the complementary nature between visual RGB and thermal infrared data. In this work, we address the RGB-T object tracking task with a novel spatial- and channel-aware multi-modal adaptation (SCA-MMA) framework, which builds an adaptive feature learning process for better mining this object-aware information in a unified network. For each type of modality information, the spatial-aware adaptation mechanism is introduced to dynamically learn the location-based characteristics of specific tracking objects at multiple convolution layers. Further, the channel-aware multi-modal adaptation mechanism is proposed to adaptively learn the feature fusion/aggregation of different modalities. In order to perform object tracking, we employ a binary classification module with two fully connected layers to predict the bounding boxes of specific targets. Comprehensive evaluations on GTOT and RGBT234 datasets demonstrate the significant superiority of our proposed SCA-MMA for robust RGB-T object tracking tasks. In particular, the precision rate (PR) and success rate (SR) on GTOT and RGBT234 datasets can reach 90.5%/73.2% and 80.2%/56.9%, significantly higher than the state-of-the-art algorithms.

show abstract

Exploring Fusion Strategies for Accurate RGBT Visual Object Tracking

Cited by 2 publications

References 62 publications

RFN-Nest: An end-to-end residual fusion network for infrared and visible images

RFN-Nest: An end-to-end residual fusion network for infrared and visible images

SCA-MMA: Spatial and Channel-Aware Multi-Modal Adaptation for Robust RGB-T Object Tracking

Contact Info

Product

Resources

About