Abstract:The rapid development of high spatial resolution (HSR) remote sensing imagery techniques not only provide a considerable amount of datasets for scene classification tasks but also request an appropriate scene classification choice when facing with finite labeled samples. AlexNet, as a relatively simple convolutional neural network (CNN) architecture, has obtained great success in scene classification tasks and has been proven to be an excellent foundational hierarchical and automatic scene classification technique. However, current HSR remote sensing imagery scene classification datasets always have the characteristics of small quantities and simple categories, where the limited annotated labeling samples easily cause non-convergence. For HSR remote sensing imagery, multi-scale information of the same scenes can represent the scene semantics to a certain extent but lacks an efficient fusion expression manner. Meanwhile, the current pre-trained AlexNet architecture lacks a kind of appropriate supervision for enhancing the performance of this model, which easily causes overfitting. In this paper, an improved pre-trained AlexNet architecture named pre-trained AlexNet-SPP-SS has been proposed, which incorporates the scale pooling-spatial pyramid pooling (SPP) and side supervision (SS) to improve the above two situations. Extensive experimental results conducted on the UC Merced dataset and the Google Image dataset of SIRI-WHU have demonstrated that the proposed pre-trained AlexNet-SPP-SS model is superior to the original AlexNet architecture as well as the traditional scene classification methods.
Geospatial object detection from high spatial resolution (HSR) remote sensing imagery is a significant and challenging problem when further analyzing object-related information for civil and engineering applications. However, the computational efficiency and the separate region generation and localization steps are two big obstacles for the performance improvement of the traditional convolutional neural network (CNN)-based object detection methods. Although recent object detection methods based on CNN can extract features automatically, these methods still separate the feature extraction and detection stages, resulting in high time consumption and low efficiency. As a significant influencing factor, the acquisition of a large quantity of manually annotated samples for HSR remote sensing imagery objects requires expert experience, which is expensive and unreliable. Despite the progress made in natural image object detection fields, the complex object distribution makes it difficult to directly deal with the HSR remote sensing imagery object detection task. To solve the above problems, a highly efficient and robust integrated geospatial object detection framework based on faster region-based convolutional neural network (Faster R-CNN) is proposed in this paper. The proposed method realizes the integrated procedure by sharing features between the region proposal generation stage and the object detection stage. In addition, a pre-training mechanism is utilized to improve the efficiency of the multi-class geospatial object detection by transfer learning from the natural imagery domain to the HSR remote sensing imagery domain. Extensive experiments and comprehensive evaluations on a publicly available 10-class object detection dataset were conducted to evaluate the proposed method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.