In recent years, the increase in drone traffic and the potential for unauthorized surveillance has underscored the urgent need for technological advances in drone detection. Despite the rapid advancements in deep learning that have significantly improved object detection tasks, air-to-air unmanned aerial vehicle (UAV) detection continues to pose significant challenges. Challenges such as complex backgrounds, small size of UAVs in captured images, and variations in flight poses and angles pose significant difficulties for traditional deep learning approaches, mainly because of the inherent limitations of conventional convolutional neural network architectures in discriminating fine details against dynamically changing backdrops. To address these challenges, this study introduces EA-DINO, a new deep learning network based on enhanced aggregation (EA) and DINO. The network incorporates a series of improvements over DINO. First, the backbone is replaced with a Swin transformer, and agent attention is integrated. Second, an EA feature pyramid network is added to the network architecture. Experimental evaluations demonstrate that, in the context of air-to-air UAV detection complexities, the EA-DINO model achieves an $mAP_{50}$ of 96.6\% on the Det-Fly dataset, representing an improvement of 8.3\% over the baseline DINO model. This improvement is noteworthy compared with other mainstream models, illustrating the effectiveness of the proposed model in addressing the challenges of air-to-air UAV detection.