For the Siamese network-based trackers utilizing modern deep feature extraction networks without taking full advantage of the different levels of features, tracking drift is prone to occur in aerial scenarios, such as target occlusion, scale variation, and low-resolution target tracking. Additionally, the accuracy is low in challenging scenarios of visual tracking, which is due to the imperfect utilization of features. To improve the performance of the existing Siamese tracker in the above-mentioned challenging scenes, we propose a Siamese tracker based on Transformer multi-level feature enhancement with a hierarchical attention strategy. The saliency of the extracted features is enhanced by the process of Transformer Multi-level Enhancement; the application of the hierarchical attention strategy makes the tracker adaptively notice the target region information and improve the tracking performance in challenging aerial scenarios. Meanwhile, we conducted extensive experiments and qualitative or quantitative discussions on UVA123, UAV20L, and OTB100 datasets. Finally, the experimental results show that our SiamHAS performs favorably against several state-of-the-art trackers in these challenging scenarios.
Siamese network-based trackers satisfy the balance between performance and efficiency for visual tracking. However, they do not have enough robustness to handle the challenges of target occlusion and similar objects. In order to improve the robustness of the tracking algorithm, this paper proposes visual tracking with FPN based on Transformer and response map enhancement. In this paper, a feature pyramid structure based on Transformer is designed to encode robust target-specific appearance features, as well as the response map enhanced module to improve the tracker’s ability to distinguish object and background. Extensive experiments and ablation experiments are conducted on many challenging benchmarks such as UAV123, GOT-10K, LaSOT and OTB100. These results show that the tracking algorithm we proposed in this paper can effectively improve the tracking robustness against the challenges of target occlusion and similar object, and thus improve the precision rate and success rate of the tracking algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.