Abstract:In recent years, Fully Convolutional Networks (FCN) have led to a great improvement of semantic labeling for various applications including multi-modal remote sensing data. Although different fusion strategies have been reported for multi-modal data, there is no in-depth study of the reasons of performance limits. For example, it is unclear, why an early fusion of multi-modal data in FCN does not lead to a satisfying result. In this paper, we investigate the contribution of individual layers inside FCN and propose an effective fusion strategy for the semantic labeling of color or infrared imagery together with elevation (e.g., Digital Surface Models). The sensitivity and contribution of layers concerning classes and multi-modal data are quantified by recall and descent rate of recall in a multi-resolution model. The contribution of different modalities to the pixel-wise prediction is analyzed explaining the reason of the poor performance caused by the plain concatenation of different modalities. Finally, based on the analysis an optimized scheme for the fusion of layers with image and elevation information into a single FCN model is derived. Experiments are performed on the ISPRS Vaihingen 2D Semantic Labeling dataset (infrared and RGB imagery as well as elevation) and the Potsdam dataset (RGB imagery and elevation). Comprehensive evaluations demonstrate the potential of the proposed approach.
ABSTRACT:In this paper we present an approach for semantic interpretation of facade images based on a Convolutional Network. Our network processes the input images in a fully convolutional way and generates pixel-wise predictions. We show that there is no need for large datasets to train the network when transfer learning is employed, i. e., a part of an already existing network is used and fine-tuned, and when the available data is augmented by using deformed patches of the images for training. The network is trained end-to-end with patches of the images and each patch is augmented independently. To undo the downsampling for the classification, we add deconvolutional layers to the network. Outputs of different layers of the network are combined to achieve more precise pixel-wise predictions. We demonstrate the potential of our network based on results for the eTRIMS (Korč and Förstner, 2009) dataset reduced to facades.
ABSTRACT:In this paper we present an approach for semantic interpretation of facade images based on a Convolutional Network. Our network processes the input images in a fully convolutional way and generates pixel-wise predictions. We show that there is no need for large datasets to train the network when transfer learning is employed, i. e., a part of an already existing network is used and fine-tuned, and when the available data is augmented by using deformed patches of the images for training. The network is trained end-to-end with patches of the images and each patch is augmented independently. To undo the downsampling for the classification, we add deconvolutional layers to the network. Outputs of different layers of the network are combined to achieve more precise pixel-wise predictions. We demonstrate the potential of our network based on results for the eTRIMS (Korč and Förstner, 2009) dataset reduced to facades.
ABSTRACT:In recent years, the developments for Fully Convolutional Networks (FCN) have led to great improvements for semantic segmentation in various applications including fused remote sensing data. There is, however, a lack of an in-depth study inside FCN models which would lead to an understanding of the contribution of individual layers to specific classes and their sensitivity to different types of input data. In this paper, we address this problem and propose a fusion model incorporating infrared imagery and Digital Surface Models (DSM) for semantic segmentation. The goal is to utilize heterogeneous data more accurately and effectively in a single model instead of to assemble multiple models. First, the contribution and sensitivity of layers concerning the given classes are quantified by means of their recall in FCN. The contribution of different modalities on the pixel-wise prediction is then analyzed based on visualization. Finally, an optimized scheme for the fusion of layers with color and elevation information into a single FCN model is derived based on the analysis. Experiments are performed on the ISPRS Vaihingen 2D Semantic Labeling dataset. Comprehensive evaluations demonstrate the potential of the proposed approach.
Abstract. We propose a pipeline for the detection as well as modeling of individual buildings based on multi-source images. It allows to consistently reconstruct whole buildings at Level of Detail 3 (LoD3): the roof from airborne images and the facades including elements such as windows and doors mainly from terrestrial images. We employ a parametrized top-down model – the “shell model” – with the roof as well as the facades semantically and geometrically integrated. This generative model fosters stability for building detection by enabling the use of multi-source data and offers flexibility in modeling by means of a fully CAD-compatible integration of building components. Experiments performed on imagery from different terrestrial and airborne (Unmanned Aerial Vehicle – UAV) cameras demonstrate the potential of the approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.