As a method of deep learning interpretability, class activation mapping (CAM) is efficient and convenient for extracting geographic objects supervised by image-level labels. However, in addition to the inherent problem of inaccuracy and incompleteness of CAM, we have to deal with the spectral and spatial variance of geographic objects when applying CAM methods to remote sensing images. To explore the capabilities of CAM methods on extracting various geographic objects, we make a comprehensive comparison of five commonly-used CAM methods, including original CAM, GradCAM, GradCAM++, SmoothGradCAM++, and ScoreCAM, in four aspects: (1) efficiency, (2) accuracy, (3) effectiveness on dealing with the spectral and spatial variance, and (4) performance of delineating different geographic object categories. The results demonstrate that the original CAM, GradCAM, and GradCAM++ achieves the highest efficiency, accuracy, and integrity for extracting geographic objects, respectively, which can help us choose the appropriate CAM methods according to the specific requirements of different extraction tasks. Benefiting from the capability in extracting various geographic objects and adaptability in complex scenes, GradCAM achieves the best performance in dealing with the spectral and spatial variance problem and shows the advantage of capturing object details and keeping object completeness at the same time. In addition to the comparison experiments and suggestions, we also provide the principle explanations of the performance differences. The findings of this study could contribute to a deep understanding of different CAM methods and benefit to selecting suitable CAM methods for extracting geographic objects from the perspectives of both principles and experiments.
Urban green space plays a crucial role in the construction of ecological city and livable environment. While multi-temporal remote sensing images provide strong support for urban green cover monitoring, they often suffer from data shifting, where the data distribution varies from phase to phase. Designing a general multi-temporal framework to extract urban green cover is challenging, mainly due to possible time-consuming data-labeling and inconsistent prediction. To address that, we propose multi-training, a novel method for land cover classification on multi-temporal remote sensing images. Multi-training is a two-stage method to independently train classifier on each phase in the training stage and then to combine the information from all the classifiers in the communication stage. As a semi-supervised learning method, multi-training adopts a new rule to obtain the confidence of unlabeled samples' prediction, which reduces the dependence on labeled data and increases the result's consistency between phases. Experimental results show that multi-training outperforms self-training, co-training, tri-training, and super-training on both accuracy and consistency on multi-temporal remote sensing image datasets. Furthermore, we have analyzed the necessary parameters in our method and conclude that the number and the combination of phases will dominate the prediction results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.