As the lakes located in the Qinghai-Tibet Plateau are important carriers of water resources in Asia, dynamic changes to these lakes intuitively reflect the climate and water resource variations of the Qinghai-Tibet Plateau. To address the insufficient performance of the Convolutional Neural Network (CNN) in learning the spatial relationship between long-distance continuous pixels, this study proposes a water recognition model for lakes on the Qinghai-Tibet Plateau based on U-Net and ViTenc-UNet. This method uses Vision Transformer (ViT) to replace the continuous Convolutional Neural Network layer in the encoder of the U-Net model, which can more accurately identify and extract the continuous spatial relationship of lake water bodies. A Convolutional Block Attention Module (CBAM) mechanism was added to the decoder of the model enabling the spatial information and spectral information characteristics of the water bodies to be more completely preserved. The experimental results show that the ViTenc-UNet model can complete the task of lake water recognition on the Qinghai-Tibet Plateau more efficiently, and the Overall Accuracy, Intersection over Union, Recall, Precision, and F1 score of the classification results for lake water bodies reached 99.04%, 98.68%, 99.08%, 98.59%, and 98.75%, which were, respectively, 4.16%, 6.20% 5.34%, 4.80%, and 5.34% higher than the original U-Net model. Compared to FCN, the DeepLabv3+, TransUNet, and Swin-Unet models also have different degrees of advantages. This model innovatively introduces ViT and CBAM into the water extraction task of lakes on the Qinghai-Tibet Plateau, showing excellent water classification performance of these lake bodies. This method has certain classification advantages and will provide an important scientific reference for the accurate real-time monitoring of important water resources on the Qinghai-Tibet Plateau.
This paper focuses on the problems of omission, misclassification, and inter-adhesion due to overly dense distribution, intraclass diversity, and interclass variability when extracting winter wheat (WW) from high-resolution images. This paper proposes a deep supervised network RAunet model with multi-scale features that incorporates a dual-attention mechanism with an improved U-Net backbone network. The model mainly consists of a pyramid input layer, a modified U-Net backbone network, and a side output layer. Firstly, the pyramid input layer is used to fuse the feature information of winter wheat at different scales by constructing multiple input paths. Secondly, the Atrous Spatial Pyramid Pooling (ASPP) residual module and the Convolutional Block Attention Module (CBAM) dual-attention mechanism are added to the U-Net model to form the backbone network of the model, which enhances the feature extraction ability of the model for winter wheat information. Finally, the side output layer consists of multiple classifiers to supervise the results of different scale outputs. Using the RAunet model to extract the spatial distribution information of WW from GF-2 imagery, the experimental results showed that the mIou of the recognition results reached 92.48%, an improvement of 2.66%, 4.15%, 1.42%, 2.35%, 3.76%, and 0.47% compared to FCN, U-Net, DeepLabv3, SegNet, ResUNet, and UNet++, respectively. The superiority of the RAunet model in high-resolution images for WW extraction was verified in effectively improving the accuracy of the spatial distribution information extraction of WW.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.