Stacked U-Nets for Ground Material Segmentation in Remote Sensing Imagery

Ghosh, Arthita; Ehrlich, Max; Shah, Sohil; Davis, Larry S.; Chellappa, Rama

doi:10.1109/cvprw.2018.00047

Cited by 55 publications

(33 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In recent years, some tentative work has been proposed for multimodal data analysis in RS ( Gómez-Chova et al, 2015 , Kampffmeyer et al, 2016 , Máttyus et al, 2016 , Audebert et al, 2016 , Audebert et al, 2017 , Zampieri et al, 2018 , Ghosh et al, 2018 ). Related to ours for scene parsing with multimodal deep networks, an early deep fusion architecture, simply stacking all multi-modalities as input, is used for semantic segmentation of urban RS images ( Kampffmeyer et al, 2016 ).…”

Section: Related Workmentioning

confidence: 99%

X-ModalNet: A semi-supervised deep cross-modal network for classification of remote sensing data

Hong

Yokoya

Xia

et al. 2020

ISPRS Journal of Photogrammetry and Remote Sensing

192

View full text Add to dashboard Cite

This paper addresses the problem of semi-supervised transfer learning with limited cross-modality data in remote sensing. A large amount of multi-modal earth observation images, such as multispectral imagery (MSI) or synthetic aperture radar (SAR) data, are openly available on a global scale, enabling parsing global urban scenes through remote sensing imagery. However, their ability in identifying materials (pixel-wise classification) remains limited, due to the noisy collection environment and poor discriminative information as well as limited number of well-annotated training images. To this end, we propose a novel cross-modal deep-learning framework, called X-ModalNet, with three well-designed modules: self-adversarial module, interactive learning module, and label propagation module, by learning to transfer more discriminative information from a small-scale hyperspectral image (HSI) into the classification task using a large-scale MSI or SAR data. Significantly, X-ModalNet generalizes well, owing to propagating labels on an updatable graph constructed by high-level features on the top of the network, yielding semi-supervised cross-modality learning. We evaluate X-ModalNet on two multi-modal remote sensing datasets (HSI-MSI and HSI-SAR) and achieve a significant improvement in comparison with several state-of-the-art methods.

show abstract

Section: Related Workmentioning

confidence: 99%

X-ModalNet: A semi-supervised deep cross-modal network for classification of remote sensing data

Hong

Yokoya

Xia

et al. 2020

ISPRS Journal of Photogrammetry and Remote Sensing

192

View full text Add to dashboard Cite

show abstract

“…4 showing the end-to-end pipeline of the DDCM-Net combined with a pre-trained model for land cover classification. Compared to other encoder-decoder architectures, our proposed DDCM-Net only fuses low-level features one time before the final prediction CNN layers, instead of aggregating multi-scale features captured at many different encoder layers [2], [6], [27], [31], [32], [33], [34], [35], [36]. This makes our model simple and neat, yet effective with lower computational cost.…”

Section: The Ddcm Networkmentioning

confidence: 99%

“…Our DDCM-SER50 model achieves new state-of-the-art result with 56.2% mIoU on DeepGlobe land cover classification challenge dataset. As shown in Table IV, we compare our DDCM network with other published models ( [30], [31], [32], [33], [34], [35], [36]) on the hold-out validation set (the public leaderboard 3 up to the date of May 1, 2019). Our model obtained above 3.5% higher mIoU than the second best model [35].…”

Section: Deepglobementioning

confidence: 99%

Dense Dilated Convolutions’ Merging Network for Land Cover Classification

Liu

Kampffmeyer

Jenssen

et al. 2020

IEEE Trans. Geosci. Remote Sensing

107

View full text Add to dashboard Cite

Land cover classification of remote sensing images is a challenging task due to limited amounts of annotated data, highly imbalanced classes, frequent incorrect pixel-level annotations, and an inherent complexity in the semantic segmentation task. In this work, we propose a novel architecture called the Dense Dilated Convolutions Merging Network (DDCM-Net) to address this task. The proposed DDCM-Net consists of dense dilated image convolutions merged with varying dilation rates. This effectively utilizes rich combinations of dilated convolutions that enlarge the network's receptive fields with less parameters and features compared to the state-of-the-art approaches in the remote sensing domain. Importantly, DDCM-Net obtains fused local and global context information, in effect incorporating surrounding discriminative capability for multi-scale and complex shaped objects with similar color and textures in very high resolution aerial imagery. We demonstrate the effectiveness, robustness and flexibility of the proposed DDCM-Net on the publicly available ISPRS Potsdam and Vaihingen data, as well as the DeepGlobe land cover dataset. Our single model, trained on 3-band Potsdam and Vaihingen data, achieves better accuracy in terms of both mean intersection over union (mIoU) and F1-score compared to other published models trained with more than 3band data. We further validate our model on the DeepGlobe dataset, achieving state-of-the-art result 56.2% mIoU with much less parameters and at a lower computational cost compared to related recent work.

show abstract

“…Through end-to-end training, the U-Net takes on as input an image of any size and produces a segmentation map of similar dimensions. Thus, due to these enhanced properties, U-Net gained a high level of success and has been applied in various segmentation tasks [4], [5].…”

Section: Introductionmentioning

confidence: 99%

BB-UNet: U-Net With Bounding Box Prior

Jurdi

Petitjean

Honeiné

et al. 2020

IEEE J. Sel. Top. Signal Process.

View full text Add to dashboard Cite

Medical image segmentation is the process of anatomically isolating organs for analysis and treatment. Leading works within this domain emerged with the well-known U-Net. Despite its success, recent works have shown the limitations of U-Net to conduct segmentation given image particularities such as noise, corruption or lack of contrast. Prior knowledge integration allows to overcome segmentation ambiguities. This paper introduces BB-UNet (Bounding Box U-Net), a deep learning model that integrates location as well as shape prior onto model training. The proposed model is inspired by U-Net and incorporates priors through a novel convolutional layer introduced at the level of skip connections. The proposed architecture helps in presenting attention kernels onto the neural training in order to guide the model on where to look for the organs. Moreover, it finetunes the encoder layers based on positional constraints. The proposed model is exploited within two main paradigms: as a solo model given a fully supervised framework and as an ancillary model, in a weakly supervised setting. In the current experiments, manual bounding boxes are fed at inference and as such BB-Unet is exploited in a semi-automatic setting; however, BB-Unet has the potential of being part of a fully automated process, if it relies on a preliminary step of object detection. To validate the performance of the proposed model, experiments are conducted on two public datasets: the SegTHOR dataset which focuses on the segmentation of thoracic organs at risk in computed tomography (CT) images, and the Cardiac dataset which is a mono-modal MRI dataset released as part of the Decathlon challenge and dedicated to segmentation of the left atrium. Results show that the proposed method outperforms state-of-the-art methods in fully supervised learning frameworks and registers relevant results given the weakly supervised domain.

show abstract

Stacked U-Nets for Ground Material Segmentation in Remote Sensing Imagery

Cited by 55 publications

References 18 publications

X-ModalNet: A semi-supervised deep cross-modal network for classification of remote sensing data

X-ModalNet: A semi-supervised deep cross-modal network for classification of remote sensing data

Dense Dilated Convolutions’ Merging Network for Land Cover Classification

BB-UNet: U-Net With Bounding Box Prior

Contact Info

Product

Resources

About