For the Siamese network-based trackers utilizing modern deep feature extraction networks without taking full advantage of the different levels of features, tracking drift is prone to occur in aerial scenarios, such as target occlusion, scale variation, and low-resolution target tracking. Additionally, the accuracy is low in challenging scenarios of visual tracking, which is due to the imperfect utilization of features. To improve the performance of the existing Siamese tracker in the above-mentioned challenging scenes, we propose a Siamese tracker based on Transformer multi-level feature enhancement with a hierarchical attention strategy. The saliency of the extracted features is enhanced by the process of Transformer Multi-level Enhancement; the application of the hierarchical attention strategy makes the tracker adaptively notice the target region information and improve the tracking performance in challenging aerial scenarios. Meanwhile, we conducted extensive experiments and qualitative or quantitative discussions on UVA123, UAV20L, and OTB100 datasets. Finally, the experimental results show that our SiamHAS performs favorably against several state-of-the-art trackers in these challenging scenarios.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.