In this paper, we establish a workflow for estimation of built-up density and height based on multispectral Sentinel-2 data. To do so, we render the estimation of built-up density and height as a supervised learning problem. Given the rational level of measurement of those two target variables, the regression estimation problem is regarded as finding the mapping between an incoming vector, i.e., ubiquitously available features computed from Sentinel-2 data, and an observable output (i.e., training set), which is derived over spatially limited areas in an automated manner. As such, training sets are automatically generated from a joint exploitation of TanDEM-X mission elevation data and Sentinel-2 imagery, and, as an alternative, from cadastral sources. The training sets are used to regress the target variables for spatial processing units which correspond to urban neighborhood scales. From a methodological point of view, we introduce a novel ensemble regression approach, i.e., multistrategy ensemble regression (MSER), based on advanced machine learning-based regression algorithms including Random Forest Regression, Support Vector Regression, Gaussian Process Regression, and Neural Network Regression. To establish a robust ensemble, those algorithms are learned with a modified version of the AdaBoost.RT algorithm. However, to reliably ensure diversity between single boosted regressors, we include a random feature subspace method in the procedure. In contrast to existing approaches, we selectively prune non-favorable regressors trained during the boosting procedure and calculate the final prediction by a weighted mean function on the residual models to ensure enhanced accuracy properties of predictions. Finally, outputs are concatenated into a single prediction with a decision fusion strategy. Experimental results are obtained from four test areas which cover the settlement areas of the four largest German cites, i.e., Berlin, Hamburg, Munich, and Cologne. The results unambiguously underline the beneficial properties of the MSER approach, since all best predictions were obtained with a boosted regressor in conjunction with a decision fusion strategy in a comparative setup. The mean absolute errors of corresponding models vary between 3-16% and 1-5.4m with respect to builtup density and height, respectively, depending on the validation strategy, size of the spatial processing units, and test area. Also in a domain adaptation setup (i.e., when learning a model over a source domain and applying it over a geographically different target domain) numerous predictions show comparable accuracy levels as predictions obtained within a source domain. This further underlines the viability to transfer a model and, thus, enable a substitution of the training data in the target domains.
In this letter, we establish two sampling schemes to select training and test sets for supervised classification. We do this in order to investigate whether estimated generalization capabilities of learned models can be positively biased from the usage of spatial features. Numerous spatial features impose homogeneity constraints on the image data, whereby a spatially connected set of image elements is attributed identical feature values. In addition to a frequent occurrence of intrinsic spatial autocorrelation, this leads to extrinsic spatial autocorrelation with respect to the image data. The first sampling scheme follows a spatially random partitioning into training and test sets. In contrast to that, the second strategy implements a spatially disjoint partitioning, which considers in particular topological constraints that arise from the deployment of spatial features. Experimental results are obtained from multi-and hyperspectral acquisitions over urban environments. They underline that a large share of the differences between estimated generalization capabilities obtained with the spatially disjoint and non-disjoint sampling strategy can be attributed to the use of spatial features, whereby differences increase with an increasing size of the spatial neighborhood considered for computing a spatial feature. This stresses the necessity of a proper spatial sampling scheme for model evaluation to avoid overoptimistic model assessments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.