Multi-Scale Context Aggregation for Semantic Segmentation of Remote Sensing Images

Zhang, Jing; Lin, Shaofu; Ding, Lei; Bruzzone, Lorenzo

doi:10.3390/rs12040701

Cited by 116 publications

(41 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In recent years, networks such as AlexNet, VGG16, GoogleNet, and InceptionV3 have shown strong generalization capabilities in scene recognition. The use of ready-made pre-trained CNN models as a general feature extractor has become a method of remote sensing scene classification [19]. However, generating large parameters during the training process leads to higher requirements on hardware devices.…”

Section: Related Workmentioning

confidence: 99%

Urban Remote Sensing Scene Recognition Based on Lightweight Convolution Neural Network

2021

View full text Add to dashboard Cite

The use rate of urban land is a significant sign to evaluate urban construction, and scene recognition has important application value in improving urban land use rate. In this paper, a new lightweight model based on VGG16 is proposed to extract distinct features of remote sensing images through five convolution modules. This model uses depthwise separable convolution to reduce the network parameters. An adaptive pooling layer is added to solve the inherent non-adaptive problem of the convolution network. It makes the network insensitive to the size of the input image. The global average pooling layer is used to sum the information to make the input spatial transformation more stable. This paper conducts training and testing on two data sets, NWPU-RESISC45 Dataset and SIRI-WHU Dataset, and the recognition scenarios are 13 and 12 categories. Experimental results show that this method is better than other models in recognition accuracy and model size.

show abstract

Section: Related Workmentioning

confidence: 99%

Urban Remote Sensing Scene Recognition Based on Lightweight Convolution Neural Network

2021

View full text Add to dashboard Cite

show abstract

“…Inspired by natural image semantic segmentation solutions, many targeted approaches, based on encoder-decoder networks, are designed for remote-sensing images. To acquire the desired accurate segmentation and localization, Zhang, Lin, and Ding et al (2020) combined HRNet (High-resolution network) with ASP (Adaptive spatial pooling) to increase localization accuracy and preserve spatial details. The experimental results on the Vaihingen dataset (http://www2.isprs.org/commissions/comm3/wg4/2dsem-label-vaihingen.html) and Potsdam dataset (http://www2.isprs.org/commissions/ comm3/wg4/2d-sem-label-potsdam.html) reach SOTA.…”

Section: Introductionmentioning

confidence: 99%

Dual attention deep fusion semantic segmentation networks of large-scale satellite remote-sensing images

Lyu

et al. 2021

International Journal of Remote Sensing

View full text Add to dashboard Cite

Since DCNNs (deep convolutional neural networks) have been successfully applied to various academic and industrial fields, semantic segmentation methods, based on DCNNs, are increasingly explored for remote-sensing image interpreting and information extracting. It is still highly challenging due to the presence of irregular target shapes, and similarities of inter-and intra-class objects in largescale high-resolution satellite images. A majority of existing methods fuse the multi-scale features that always fail to provide satisfactory results. In this paper, a dual attention deep fusion semantic segmentation network of large-scale satellite remote-sensing images is proposed (DASSN_RSI). The framework consists of novel encoderdecoder architecture, and a weight-adaptive loss function based on focal loss. To refine high-level semantic and low-level spatial feature maps, the deep layer channel attention module (DLCAM) and shallow layer spatial attention module (SLSAM) are designed and appended with specific blocks. Then the DUpsampling is incorporated to fuse feature maps in a lossless way. Peculiarly, the weight-adaptive focal loss (W-AFL) is inferred and embedded successfully, alleviating the class-imbalanced issue as much as possible. The extensive experiments are conducted on Gaofen image dataset (GID) datasets (Gaofen-2 satellite images, coarse set with five categories and refined set with fifteen categories). And the results show that our approach achieves state-of-the-art performance compared to other typical variants of encoder-decoder networks in the numerical evaluation and visual inspection. Besides, the necessary ablation studies are carried out for a comprehensive evaluation.

show abstract

“…Finally, the comprehensive evaluation of three sets of data sets highlights the superiority of this method. Zhang et al [18] introduced a high-resolution network (HRNet) to enhance features to obtain contextual semantic information. This method uses the spatial linking method of the model to show more semantics for the low-resolution information containing more semantic information to the high-resolution information, and enhance the high-resolution information, thereby solving the positioning caused by cascading pooling.…”

Section: Introductionmentioning

confidence: 99%

Pixel-Level Prediction for Ocean Remote Sensing Image Features Fusion Based on Global and Local Semantic Relations

et al. 2021

View full text Add to dashboard Cite

With the rapid development of remote-sensing imaging technology, remote-sensing images have become increasingly diverse, and people are paying more attention to ocean remote-sensing research. Because ocean remote-sensing data are complex, and the ocean environment is diverse, results will differ, even if the same target is detected at different times in the same scene. To obtain more semantic features and better pixel-level prediction capabilities, this paper proposes a pixel-level ocean remote-sensing image algorithm (GLPO-Net) that combines local and global features. First, texture features, color features, and spatial relationship features are extracted. Second, the algorithm constructs a multiscale local cross-attention mechanism strategy to obtain feature weight information in different directions to fully mine the local features of ocean remote-sensing images. Concurrently, an algorithm constructs a multiscale global cross-attention mechanism strategy to obtain global features. Then, the fusion of global features and local features is described in each submodule to obtain more representative deep features. Finally, small-sample ocean remote-sensing is described via image pixel-level prediction. The algorithm proposed in this paper has been tested with three public ocean remote-sensing datasets. The experimental results show that the proposed GLPO-Net algorithm can learn features from small samples of ocean remote-sensing images. Compared to the prediction results of other remote-sensing image algorithms, GLPO-Net exhibits better prediction capabilities. INDEX TERMS Ocean remote sensing, deep learning, features fusion, multi-scale convolutional.

show abstract

Multi-Scale Context Aggregation for Semantic Segmentation of Remote Sensing Images

Cited by 116 publications

References 30 publications

Urban Remote Sensing Scene Recognition Based on Lightweight Convolution Neural Network

Urban Remote Sensing Scene Recognition Based on Lightweight Convolution Neural Network

Dual attention deep fusion semantic segmentation networks of large-scale satellite remote-sensing images

Pixel-Level Prediction for Ocean Remote Sensing Image Features Fusion Based on Global and Local Semantic Relations

Contact Info

Product

Resources

About