Effective Fusion of Multi-Modal Remote Sensing Data in a Fully Convolutional Network for Semantic Labeling

Zhang, Wenkai; Huang, Hai; Schmitz, Matthias; Sun, Xian; Wang, Hongqi; Mayer, Helmut

doi:10.3390/rs10010052

Cited by 35 publications

(36 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, some studies [63][64][65][66] found that the effective fusion of color imagery with elevation (such as DSM) might be helpful to resolving these problems. The elevation data containing the height information of the ground surface make it easy to discriminate the building roofs from impervious surfaces.…”

Section: Limitations Of Deep Learning Models In This Studymentioning

confidence: 99%

Semantic Segmentation of Urban Buildings from VHR Remote Sensing Imagery Using a Deep Convolutional Neural Network

Zhang

et al. 2019

Remote Sensing

172

View full text Add to dashboard Cite

Urban building segmentation is a prevalent research domain for very high resolution (VHR) remote sensing; however, various appearances and complicated background of VHR remote sensing imagery make accurate semantic segmentation of urban buildings a challenge in relevant applications. Following the basic architecture of U-Net, an end-to-end deep convolutional neural network (denoted as DeepResUnet) was proposed, which can effectively perform urban building segmentation at pixel scale from VHR imagery and generate accurate segmentation results. The method contains two sub-networks: One is a cascade down-sampling network for extracting feature maps of buildings from the VHR image, and the other is an up-sampling network for reconstructing those extracted feature maps back to the same size of the input VHR image. The deep residual learning approach was adopted to facilitate training in order to alleviate the degradation problem that often occurred in the model training process. The proposed DeepResUnet was tested with aerial images with a spatial resolution of 0.075 m and was compared in performance under the exact same conditions with six other state-of-the-art networks-FCN-8s, SegNet, DeconvNet, U-Net, ResUNet and DeepUNet. Results of extensive experiments indicated that the proposed DeepResUnet outperformed the other six existing networks in semantic segmentation of urban buildings in terms of visual and quantitative evaluation, especially in labeling irregular-shape and small-size buildings with higher accuracy and entirety. Compared with the U-Net, the F1 score, Kappa coefficient and overall accuracy of DeepResUnet were improved by 3.52%, 4.67% and 1.72%, respectively. Moreover, the proposed DeepResUnet required much fewer parameters than the U-Net, highlighting its significant improvement among U-Net applications. Nevertheless, the inference time of DeepResUnet is slightly longer than that of the U-Net, which is subject to further improvement.

show abstract

Section: Limitations Of Deep Learning Models In This Studymentioning

confidence: 99%

Semantic Segmentation of Urban Buildings from VHR Remote Sensing Imagery Using a Deep Convolutional Neural Network

Zhang

et al. 2019

Remote Sensing

172

View full text Add to dashboard Cite

show abstract

“…While the observational-level fusion directly combines raw datasets, the feature-level fusion integrates the feature sets derived from multimodal into a single feature set. Several researchers demonstrated the improved performance by multimodality fusion approaches [13][14][15]27]. However, the performance varies depends upon the robustness of fusion strategies and its efficacy to combine multimodal information in a more complimentary manner.…”

Section: Methodology and Conceptual Frameworkmentioning

confidence: 99%

“…In order to address this fundamental query, this research focuses on the 3D point cloud segmentation with various fusion and non-fusion approaches and evaluated the performance. If the data representation similar or transformed into a similar representation as in the case of [10,13], multimodality fusion can be carried out in numerous ways. Nonetheless, both data representation and range of values are different in LiDAR point cloud and images, and hence, the fusion approach has to respect the characteristics of both modality.…”

Section: Methodology and Conceptual Frameworkmentioning

confidence: 99%

“…Several previous studies [9][10][11] have been proposed multimodal aerial data usage for segmentation, where, the point cloud is directly interpolated to the resolution of aerial images and further segmented by considering it as a 2.5D data [12,13]. Pan et al [12] uses fully connected layers for fusing CNN derived multimodal features, while improved method [13] uses an end-to-end multi-level fusion using Fully Convolutional Network (FCN). These approaches use the Digital Surface Model (DSM) and multispectral data for multimodality fusion.…”

Section: Image and Digital Elevation Data Fusion For 2d Segmentation mentioning

confidence: 99%

See 1 more Smart Citation

A Point-Wise LiDAR and Image Multimodal Fusion Network (PMNet) for Aerial Point Cloud 3D Semantic Segmentation

2019

View full text Add to dashboard Cite

3D semantic segmentation of point cloud aims at assigning semantic labels to each point by utilizing and respecting the 3D representation of the data. Detailed 3D semantic segmentation of urban areas can assist policymakers, insurance companies, governmental agencies for applications such as urban growth assessment, disaster management, and traffic supervision. The recent proliferation of remote sensing techniques has led to producing high resolution multimodal geospatial data. Nonetheless, currently, only limited technologies are available to fuse the multimodal dataset effectively. Therefore, this paper proposes a novel deep learning-based end-to-end Point-wise LiDAR and Image Multimodal Fusion Network (PMNet) for 3D segmentation of aerial point cloud by fusing aerial image features. PMNet respects basic characteristics of point cloud such as unordered, irregular format and permutation invariance. Notably, multi-view 3D scanned data can also be trained using PMNet since it considers aerial point cloud as a fully 3D representation. The proposed method was applied on two datasets (1) collected from the urban area of Osaka, Japan and (2) from the University of Houston campus, USA and its neighborhood. The quantitative and qualitative evaluation shows that PMNet outperforms other models which use non-fusion and multimodal fusion (observational-level fusion and feature-level fusion) strategies. In addition, the paper demonstrates the improved performance of the proposed model (PMNet) by over-sampling/augmenting the medium and minor classes in order to address the class-imbalance issues.

show abstract

“…The image size normalization means to use a series of geometrical size adjustment to ensure that the original images have the unified size or location feature. In the course of image feature extraction, training or classification, a lot of images may be used, and the size normalization for image can ensure that the images processed have the same geometric feature, so as to ensure that the subsequent feature extraction or training can be smoothly carried out [3]. The image size normalization mainly includes scaling, translation and rotation.…”

Section: Digital Image Processing Methodsmentioning

confidence: 99%

Application of AI Technology in Patrol Inspection for Surface Environment of Transmission Channel

2019

Proceedings of 2019 the 9th International Workshop on Computer Science and Engineering

View full text Add to dashboard Cite

The artificial intelligence (AI) technology is one of the most subversive computer technologies. At present, the AI technology may also lead to better management and development of power industry, will advance the automation technologies of power system in a safety, reliable and highly-efficient direction, and enjoys bright development prospect in operation of power system, power transmission & transformation field, power distribution field and information communication field. In this paper, the AI image processing technology is applied in patrol inspection for surface environment of transmission channel, so as to greatly improve the surface environment analysis efficiency for transmission channel, and thus bring a high-efficient and reliable technical approach for operation, maintenance, inspection and repair of electrical grid.

show abstract

Effective Fusion of Multi-Modal Remote Sensing Data in a Fully Convolutional Network for Semantic Labeling

Cited by 35 publications

References 22 publications

Semantic Segmentation of Urban Buildings from VHR Remote Sensing Imagery Using a Deep Convolutional Neural Network

Semantic Segmentation of Urban Buildings from VHR Remote Sensing Imagery Using a Deep Convolutional Neural Network

A Point-Wise LiDAR and Image Multimodal Fusion Network (PMNet) for Aerial Point Cloud 3D Semantic Segmentation

Application of AI Technology in Patrol Inspection for Surface Environment of Transmission Channel

Contact Info

Product

Resources

About