2024
DOI: 10.3390/rs16071168
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Features Alignment for Vision–Language Object Tracking

Ping Ye,
Gang Xiao,
Jun Liu

Abstract: Vision–language tracking presents a crucial challenge in multimodal object tracking. Integrating language features and visual features can enhance target localization and improve the stability and accuracy of the tracking process. However, most existing fusion models in vision–language trackers simply concatenate visual and linguistic features without considering their semantic relationships. Such methods fail to distinguish the target’s appearance features from the background, particularly when the target cha… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
references
References 53 publications
(108 reference statements)
0
0
0
Order By: Relevance