A critical obstacle to achieve semantic segmentation of remote sensing images by the deep convolutional neural network is the requirement of huge pixel-level labels. Taking building extraction as an example, this study focuses on how to effectively apply weakly supervised semantic segmentation (WSSS) to highresolution remote sensing (HR) images with image-level labels, which is a prominent solution for the huge labeling challenge. The widely-used two-step WSSS framework is adopted, in which the pseudo-masks are first produced from image-level labels and followed by a segmentation network trained by the pseudo-masks. In addition, the fully connected conditional random field (CRF) is utilized to explore spatial context in both training and prediction stages. Detailed analyses are implemented on applying WSSS on HR images in terms of producing pseudo-masks, training segmentation network, and optimizing predictions. We show that the trade-off between precision and recall of pseudo-masks, as well as the boundary accuracy and the background, needs to be carefully considered. The benefits of the segmentation network in the two-step framework are demonstrated in comparison to using classification network only for WSSS, and the effects of CRF-loss are identified to be powerful for improving the segmentation network while not appropriate for dense buildings. An overlapping strategy and CRF post-processing are further demonstrated to be effective for optimizing the segmentation results during inferencing. Through deliberate settings, we can generate results comparable to fully supervised on the ISPRS Potsdam and Vaihingen dataset, which is meaningful for promoting WSSS applications for extracting geographic information from HR images.
Semantic segmentation of high-resolution remote sensing images achieved great progress by utilizing deep convolutional neural networks (DCNNs) in recent years. However, the decrease of resolution in the feature map of DCNNs brings about the loss of spatial information and thus leads to the blurring of object boundary and misclassification of small objects. In addition, the class imbalance and the high diversity of geographic objects in high-resolution images exacerbate the performance. To deal with the above problems, we proposed an end-to-end DCNN network named GAMNet to balance the contradiction between global semantic information and local details. An integration of attention and gate module (GAM) is specially designed to simultaneously realize multi-scale feature extraction and boundary recovery. The integration module can be inserted in an encoder-decoder network with skip connection. Meanwhile, a composite loss function is designed to achieve deep supervision of GAM by adding an auxiliary loss, which can help improve the effectiveness of the integration module. The performance of GAMNet is quantitatively evaluated on the ISPRS 2D Semantic Labelling datasets and achieves state-of-the-art performance in comparison with other representative methods.
As a method of deep learning interpretability, class activation mapping (CAM) is efficient and convenient for extracting geographic objects supervised by image-level labels. However, in addition to the inherent problem of inaccuracy and incompleteness of CAM, we have to deal with the spectral and spatial variance of geographic objects when applying CAM methods to remote sensing images. To explore the capabilities of CAM methods on extracting various geographic objects, we make a comprehensive comparison of five commonly-used CAM methods, including original CAM, GradCAM, GradCAM++, SmoothGradCAM++, and ScoreCAM, in four aspects: (1) efficiency, (2) accuracy, (3) effectiveness on dealing with the spectral and spatial variance, and (4) performance of delineating different geographic object categories. The results demonstrate that the original CAM, GradCAM, and GradCAM++ achieves the highest efficiency, accuracy, and integrity for extracting geographic objects, respectively, which can help us choose the appropriate CAM methods according to the specific requirements of different extraction tasks. Benefiting from the capability in extracting various geographic objects and adaptability in complex scenes, GradCAM achieves the best performance in dealing with the spectral and spatial variance problem and shows the advantage of capturing object details and keeping object completeness at the same time. In addition to the comparison experiments and suggestions, we also provide the principle explanations of the performance differences. The findings of this study could contribute to a deep understanding of different CAM methods and benefit to selecting suitable CAM methods for extracting geographic objects from the perspectives of both principles and experiments.
<p>Accurate assessment of the state and changes of permafrost active layer thickness (ALT) on the Qinghai-Tibet Plateau (QTP) is critical to understanding the underlying processes driven by the global climate change. The Interferometric Synthetic Aperture Radar (InSAR) technology has been proven to be a method for quantifying deformation caused by natural and degradational processes of permafrost changes. Given its high accuracy, this method has been applied to monitoring local and regional permafrost deformation in QTP. However, there is a lack of improved large-scale regional ALT mapping algorithm using the accurate InSAR deformation data. Here, we examine the complex processes where the active layer melts spatio-temporally in depth during the thawing season, and the ground subsides due to the volume difference induced by the ice - water conversion. We developed a new model that infers ALT from the surface subsidence with help of other parameters in the process. This model takes the advantage of long-term InSAR derived deformation data, including both seasonal signal and inter-annual trend. In addition, it introduces an empirical parameter to represent the contribution of the ice-water phase change with consideration of additional water contribution from other sources. We implemented the developed method in Kekexili regional of the QTP. The seasonal deformation was obtained from radar images of Sentinel-1 by using the Small Baseline Subset Interferometry (SBAS-InSAR) technology. The thawing water was estimated in combination with soil moisture, precipitation, evapotranspiration and runoff data. Based on deformation data, vegetation cover information and existing ALT products, the empirical parameter was obtained by a data-driven regression method. Finally, a new InSAR-derived permafrost ALT map in the Kekexili region from 2015 to 2020 is produced. The results show that the average ALT is of 1.94 m with a standard deviation of 0.35 m. A comparative discussion with permafrost maps produced using other methods is given.</p>
<p>&#160;</p>
Fully convolutional network (FCN), which has excellent capability for capturing spatial context, was introduced to improve the performance of hyperspectral image classification (HSIC). However, training FCN usually requires a huge amount of pixel-level labels, which is difficult to obtain for HSIC in practical applications. How to train an FCN effectively with the supervision of limited sparse point labels has attracted the attention. The patch-free training pattern with sparse point labels was proved to be effective for HSIC task. Then, as a general training mode for remote sensing image semantic segmentation, is patch-based training of FCN also effective for HSIC with sparse point labels? To answer this question, a patch-based training framework with a novel fully convolutional network is proposed for HSIC in this study. First, cropped hyperspectral image (HSI) patches with sparse labels are input for training. Second, considering the limitation of supervision with sparse points for training, a lightweight network on the basis of an encoder-decoder structure with shallow channels is specially designed for HSIC with the aid of residual connections in the encoder and the integration of multiple attention modules to fully exploit the spectral-spatial information of HSI. Third, conditional random field (CRF)-loss is adopted as a prior complement to the point supervision for further excavation of spatial context information. The performance of the proposed method is quantitatively evaluated on three HSI datasets and achieves state-of-the-art performance in comparison with other representative methods, demonstrating the effectiveness of the patch-based training framework for HSIC.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.