Autonomous target recognition (ATR) plays a crucial role in maintaining situational awareness during environmental monitoring. Unmanned aerial vehicles (UAVs) equipped with autonomous target recognition technology can gather and analyze real-time information about targets, including their locations, sizes, and types. However, UAV-captured images in complex real-world environments often display significant variations in perspective and scale due to changes in UAV altitude and distance. Existing methods for autonomous target recognition on UAVs struggle to capture targets from large field-of-view and multi-scale images, resulting in low recognition accuracy and high false-positive rates.
This paper introduces two novel Slimmable neural network models, namely SE-YOLOv5s and ST-YOLOv5s, which are based on the YOLOv5s architecture. These models incorporate the Squeeze and Excitation and Swin-Transformer mechanisms to enhance the ability to extract features from large field-of-view and multi-scale images. To evaluate their performance, experiments were conducted on the Visdrone19 aerial dataset. Compared to the state-of-the-art YOLOv5s algorithm, the utilization of SE-YOLOv5s and ST-YOLOv5s for autonomous target recognition on low-altitude drones resulted in improvements in both accuracy and false-positive rates.
These proposed methods combine Slimmable neural network design with feature enhancement mechanisms, addressing the challenges posed by complex real-world environments in UAV missions. The advancements in autonomous target recognition on low-altitude drones will significantly contribute to enhancing situational awareness in future environmental monitoring.