Object detection in optical remote sensing images is an important and challenging task. In recent years, the methods based on convolutional neural networks have made good progress. However, due to the large variation in object scale, aspect ratio, and arbitrary orientation, the detection performance is difficult to be further improved. In this paper, we discuss the role of discriminative features in object detection, and then propose a Critical Feature Capturing Network (CFC-Net) to improve detection accuracy from three aspects: building powerful feature representation, refining preset anchors, and optimizing label assignment. Specifically, we first decouple the classification and regression features, and then construct robust critical features adapted to the respective tasks through the Polarization Attention Module (PAM). With the extracted discriminative regression features, the Rotation Anchor Refinement Module (R-ARM) performs localization refinement on preset horizontal anchors to obtain superior rotation anchors. Next, the Dynamic Anchor Learning (DAL) strategy is given to adaptively select high-quality anchors based on their ability to capture critical features. The proposed framework creates more powerful semantic representations for objects in remote sensing images and achieves high-performance real-time object detection. Experimental results on three remote sensing datasets including HRSC2016, DOTA, and UCAS-AOD show that our method achieves superior detection performance compared with many state-of-the-art approaches. Code and models are available at https://github.com/ming71/CFC-Net.
Arbitrary-oriented objects exist widely in natural scenes, and thus the oriented object detection has received extensive attention in recent years. The mainstream rotation detectors use oriented bounding boxes (OBB) or quadrilateral bounding boxes (QBB) to represent the rotating objects. However, these methods suffer from the representation ambiguity for oriented object definition, which leads to suboptimal regression optimization and the inconsistency between the loss metric and the localization accuracy of the predictions. In this paper, we propose a Representation Invariance Loss (RIL) to optimize the bounding box regression for the rotating objects. Specifically, RIL treats multiple representations of an oriented object as multiple equivalent local minima, and hence transforms bounding box regression into an adaptive matching process with these local minima. Then, the Hungarian matching algorithm is adopted to obtain the optimal regression strategy. We also propose a normalized rotation loss to alleviate the weak correlation between different variables and their unbalanced loss contribution in OBB representation. Extensive experiments on remote sensing datasets and scene text datasets show that our method achieves consistent and substantial improvement. The source code and trained models are available at https://github.com/ming71/RIDet.
Arbitrary-oriented objects widely appear in natural scenes, aerial photographs, remote sensing images, etc., and thus arbitrary-oriented object detection has received considerable attention. Many current rotation detectors use plenty of anchors with different orientations to achieve spatial alignment with ground truth boxes. Intersection-over-Union (IoU) is then applied to sample the positive and negative candidates for training. However, we observe that the selected positive anchors cannot always ensure accurate detections after regression, while some negative samples can achieve accurate localization. It indicates that the quality assessment of anchors through IoU is not appropriate, and this further leads to inconsistency between classification confidence and localization accuracy. In this paper, we propose a dynamic anchor learning (DAL) method, which utilizes the newly defined matching degree to comprehensively evaluate the localization potential of the anchors and carries out a more efficient label assignment process. In this way, the detector can dynamically select high-quality anchors to achieve accurate object detection, and the divergence between classification and regression will be alleviated. With the newly introduced DAL, we can achieve superior detection performance for arbitrary-oriented objects with only a few horizontal preset anchors. Experimental results on three remote sensing datasets HRSC2016, DOTA, UCAS-AOD as well as a scene text dataset ICDAR 2015 show that our method achieves substantial improvement compared with the baseline model. Besides, our approach is also universal for object detection using horizontal bound box. The code and models are available at https://github.com/ming71/DAL.
Object detection in aerial images has received extensive attention in recent years. The current mainstream anchor-based methods directly divide the training samples into positives and negatives according to the intersection-over-unit (IoU) of the preset anchors. This label assignment strategy assigns densely arranged samples for training, which leads to a suboptimal learning process and cause the model to suffer serious duplicate detections and missed detections. In this paper, we propose a sparse label assignment strategy (SLA) to select high-quality sparse anchors based on the posterior IoU of detections. In this way, the inconsistency between classification and regression is alleviated, and better performance can be achieved through balanced training. Next, to accurately detect small and densely arranged objects, we use a position-sensitive feature pyramid network (PS-FPN) with a coordinate attention module to extract position-sensitive features for accurate localization. Finally, the distance rotated IoU loss is proposed to eliminate the inconsistency between the training loss and the evaluation metric for better bounding box regression. Extensive experiments on the DOTA, HRSC2016, and UCAS-AOD datasets demonstrate the superiority of the proposed approach.
It is still challenging to effectively detect ship objects in optical remote-sensing images with complex backgrounds. Many current CNN-based one-stage and two-stage detection methods usually first predefine a series of anchors with various scales, aspect ratios and angles, and then the detection results can be outputted by performing once or twice classification and bounding box regression for predefined anchors. However, most of the defined anchors have relatively low accuracy, and are useless for the following classification and regression. In addition, the preset anchors are not robust to produce good performance for other different detection datasets. To avoid the above problems, in this paper we design a paired semantic segmentation network to generate more accurate rotated anchors with smaller numbers. Specifically, the paired segmentation network predicts four parts (i.e., top-left, bottom-right, top-right, and bottom-left parts) of ships. By combining paired top-left and bottom-right parts (or top-right and bottom-left parts), we can take the minimum bounding box of these two parts as the rotated anchor. This way can be more robust to different ship datasets, and the generated anchors are more accurate and have fewer numbers. Furthermore, to effectively use fine-scale detail information and coarse-scale semantic information, we use the magnified convolutional features to classify and regress the generated rotated anchors. Meanwhile, the horizontal minimum bounding box of the rotated anchor is also used to combine more context information. We compare the proposed algorithm with state-of-the-art object-detection methods for natural images and ship-detection methods, and demonstrate the superiority of our method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.