A deep neural network is suitable for remote sensing image pixel-wise classification because it effectively extracts features from the raw data. However, remote sensing images with higher spatial resolution exhibit smaller inter-class differences and greater intra-class differences; thus, feature extraction becomes more difficult. The attention mechanism, as a method that simulates the manner in which humans comprehend and perceive images, is useful for the quick and accurate acquisition of key features. In this study, we propose a novel neural network that incorporates two kinds of attention mechanisms in its mask and trunk branches; i.e., control gate (soft) and feedback attention mechanisms, respectively, based on the branches’ primary roles. Thus, a deep neural network can be equipped with an attention mechanism to perform pixel-wise classification for very high-resolution remote sensing (VHRRS) images. The control gate attention mechanism in the mask branch is utilized to build pixel-wise masks for feature maps, to assign different priorities to different locations on different channels for feature extraction recalibration, to apply stress to the effective features, and to weaken the influence of other profitless features. The feedback attention mechanism in the trunk branch allows for the retrieval of high-level semantic features. Hence, additional aids are provided for lower layers to re-weight the focus and to re-update higher-level feature extraction in a target-oriented manner. These two attention mechanisms are fused to form a neural network module. By stacking various modules with different-scale mask branches, the network utilizes different attention-aware features under different local spatial structures. The proposed method is tested on the VHRRS images from the BJ-02, GF-02, Geoeye, and Quickbird satellites, and the influence of the network structure and the rationality of the network design are discussed. Compared with other state-of-the-art methods, our proposed method achieves competitive accuracy, thereby proving its effectiveness.
Abstract:Deep neural networks (DNNs) face many problems in the very high resolution remote sensing (VHRRS) per-pixel classification field. Among the problems is the fact that as the depth of the network increases, gradient disappearance influences classification accuracy and the corresponding increasing number of parameters to be learned increases the possibility of overfitting, especially when only a small amount of VHRRS labeled samples are acquired for training. Further, the hidden layers in DNNs are not transparent enough, which results in extracted features not being sufficiently discriminative and significant amounts of redundancy. This paper proposes a novel depth-width-reinforced DNN that solves these problems to produce better per-pixel classification results in VHRRS. In the proposed method, densely connected neural networks and internal classifiers are combined to build a deeper network and balance the network depth and performance. This strengthens the gradients, decreases negative effects from gradient disappearance as the network depth increases and enhances the transparency of hidden layers, making extracted features more discriminative and reducing the risk of overfitting. In addition, the proposed method uses multi-scale filters to create a wider neural network. The depth of the filters from each scale is controlled to decrease redundancy and the multi-scale filters enable utilization of joint spatio-spectral information and diverse local spatial structure simultaneously. Furthermore, the concept of network in network is applied to better fuse the deeper and wider designs, making the network operate more smoothly. The results of experiments conducted on BJ02, GF02, geoeye and quickbird satellite images verify the efficacy of the proposed method. The proposed method not only achieves competitive classification results but also proves that the network can continue to be robust and perform well even while the amount of labeled training samples is decreasing, which fits the small training samples situation faced by VHRRS per-pixel classification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.