Earth observation (EO) sensors deliver data at daily or weekly intervals. Most land use and land cover classification (LULC) approaches, however, are designed for cloud-free and mono-temporal observations. The increasing temporal capabilities of today's sensors enable the use of temporal, along with spectral and spatial features.Domains such as speech recognition or neural machine translation, work with inherently temporal data and, today, achieve impressive results by using sequential encoder-decoder structures. Inspired by these sequence-to-sequence models, we adapt an encoder structure with convolutional recurrent layers in order to approximate a phenological model for vegetation classes based on a temporal sequence of Sentinel 2 (S2) images. In our experiments, we visualize internal activations over a sequence of cloudy and non-cloudy images and find several recurrent cells that reduce the input activity for cloudy observations. Hence, we assume that our network has learned cloud-filtering schemes solely from input data, which could alleviate the need for tedious cloud-filtering as a preprocessing step for many EO approaches. Moreover, using unfiltered temporal series of top-of-atmosphere (TOA) reflectance data, our experiments achieved state-of-the-art classification accuracies on a large number of crop classes with minimal preprocessing, compared to other classification approaches.
0000−0002−6084−2272] , Eleonora Vig 1[0000−0002−7015−6874] , Reza Bahmanyar 1[0000−0002−6999−714X] , Marco Körner 2[0000−0002−9186−4175] , and Peter Reinartz 1[0000−0002−8122−1475]Abstract. Automatic multi-class object detection in remote sensing images in unconstrained scenarios is of high interest for several applications including traffic monitoring and disaster management. The huge variation in object scale, orientation, category, and complex backgrounds, as well as the different camera sensors pose great challenges for current algorithms. In this work, we propose a new method consisting of a novel joint image cascade and feature pyramid network with multi-size convolution kernels to extract multi-scale strong and weak semantic features. These features are fed into rotation-based region proposal and region of interest networks to produce object detections. Finally, rotational non-maximum suppression is applied to remove redundant detections. During training, we minimize joint horizontal and oriented bounding box loss functions, as well as a novel loss that enforces oriented boxes to be rectangular. Our method achieves 68.16% mAP on horizontal and 72.45% mAP on oriented bounding box detection tasks on the challenging DOTA dataset, outperforming all published methods by a large margin (+6% and +12% absolute improvement, respectively). Furthermore, it generalizes to two other datasets, NWPU VHR-10 and UCAS-AOD, and achieves competitive results with the baselines even when trained on DOTA. Our method can be deployed in multi-class object detection applications, regardless of the image and object scales and orientations, making it a great choice for unconstrained aerial and satellite imagery.
This is the pre-print version, to read the final version please go to ISPRS Journal of Photogrammetry and Remote Sensing, Elsevier. (https://doi.org/DOI: 10.1016/j.isprsjprs.2018.02.006). Land-use classification based on spaceborne or aerial remote sensing images has been extensively studied over the past decades. Such classification is usually a patch-wise or pixel-wise labeling over the whole image. But for many applications, such as urban population density mapping or urban utility planning, a classification map based on individual buildings is much more informative. However, such semantic classification still poses some fundamental challenges, for example, how to retrieve fine boundaries of individual buildings. In this paper, we proposed a general framework for classifying the functionality of individual buildings. The proposed method is based on Convolutional Neural Networks (CNNs) which classify façade structures from street view images, such as Google StreetView, in addition to remote sensing images which usually only show roof structures. Geographic information was utilized to mask out individual buildings, and to associate the corresponding street view images. We created a benchmark dataset which was used for training and evaluating CNNs. In addition, the method was applied to generate building classification maps on both region and city scales of several cities in Canada and the US.
While an increasing interest in deep models for single-image depth estimation (SIDE) can be observed, established schemes for their evaluation are still limited. We propose a set of novel quality criteria, allowing for a more detailed analysis by focusing on specific characteristics of depth maps. In particular, we address the preservation of edges and planar regions, depth consistency, and absolute distance accuracy. In order to employ these metrics to evaluate and compare state-of-the-art SIDE approaches, we provide a new high-quality RGB-D dataset. We used a digital single-lens reflex (DSLR) camera together with a laser scanner to acquire high-resolution images and highly accurate depth maps. Experimental results show the validity of our proposed evaluation protocol.
Automatic building extraction and delineation from high-resolution satellite imagery is an important but very challenging task, due to the extremely large diversity of building appearances. Nowadays, it is possible to use multiple high-resolution remote sensing data sources which allow the integration of different information in order to improve the extraction accuracy of building outlines. Many algorithms are built on spectral-based or appearance-based criteria, from single or fused data sources, to perform the building footprint extraction. But the features for these algorithms are usually manually extracted, which limits their accuracy. Recently developed fully convolutional networks (FCNs), which are similar to normal convolutional neural networks (CNNs), but the last fully connected layer is replaced by another convolution layer with a large "receptive field", quickly became the state-of-the-art method for image recognition tasks, as they bring the possibility to perform dense pixel-wise classification of input images. Based on these advantages, i.e., the automatic extraction of relevant features, and dense classification of images, we propose an end-to-end fully convolutional network (FCN) which effectively combines the spectral and height information from different data sources and automatically generates a full resolution binary building mask. Our architecture (FUSED-FCN4S) consists of three parallel networks merged at a late stage, which helps propagating fine detailed information from earlier layers to higher-levels, in order to produce an output with more accurate building outlines. The inputs to the proposed Fused-FCN4s are three-band (RGB), panchromatic (PAN), and normalized digital surface model (nDSM) images. Experimental results demonstrate that the fusion of several networks is able to achieve excellent results on complex data. Moreover, the developed model was successfully applied to different cities to show its generalization capacity.
ABSTRACT:In optical remote sensing, spatial resolution of images is crucial for numerous applications. Space-borne systems are most likely to be affected by a lack of spatial resolution, due to their natural disadvantage of a large distance between the sensor and the sensed object. Thus, methods for single-image super resolution are desirable to exceed the limits of the sensor. Apart from assisting visual inspection of datasets, post-processing operations-e.g., segmentation or feature extraction-can benefit from detailed and distinguishable structures. In this paper, we show that recently introduced state-of-the-art approaches for single-image super resolution of conventional photographs, making use of deep learning techniques, such as convolutional neural networks (CNN), can successfully be applied to remote sensing data. With a huge amount of training data available, end-to-end learning is reasonably easy to apply and can achieve results unattainable using conventional handcrafted algorithms. We trained our CNN on a specifically designed, domain-specific dataset, in order to take into account the special characteristics of multispectral remote sensing data. This dataset consists of publicly available SENTINEL-2 images featuring 13 spectral bands, a ground resolution of up to 10 m, and a high radiometric resolution and thus satisfying our requirements in terms of quality and quantity. In experiments, we obtained results superior compared to competing approaches trained on generic image sets, which failed to reasonably scale satellite images with a high radiometric resolution, as well as conventional interpolation methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.