Multi-modal fusion of satellite and street-view images for urban village classification based on a dual-branch deep neural network

Chen, Boan; Feng, Quanlong; Niu, Bowen; Yan, Fengqin; Gao, Bingbo; Yang, Jianyu; Gong, Jianhua; Liu, Jiantao

doi:10.1016/j.jag.2022.102794

Cited by 27 publications

(8 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although deep learning has been popular nowadays, the training of a deep neural network needs a huge number of labelled samples 19 . Otherwise, the deep learning model would be easily overfitted on limited training samples and show poor performance when predicting the new unseen datasets.…”

Section: Background and Summarymentioning

confidence: 99%

A 10-m national-scale map of ground-mounted photovoltaic power stations in China of 2020

Feng,

Niu,

Ren

et al. 2024

Sci Data

Self Cite

View full text Add to dashboard Cite

We provide a remote sensing derived dataset for large-scale ground-mounted photovoltaic (PV) power stations in China of 2020, which has high spatial resolution of 10 meters. The dataset is based on the Google Earth Engine (GEE) cloud computing platform via random forest classifier and active learning strategy. Specifically, ground samples are carefully collected across China via both field survey and visual interpretation. Afterwards, spectral and texture features are calculated from publicly available Sentinel-2 imagery. Meanwhile, topographic features consisting of slope and aspect that are sensitive to PV locations are also included, aiming to construct a multi-dimensional and discriminative feature space. Finally, the trained random forest model is adopted to predict PV power stations of China parallelly on GEE. Technical validation has been carefully performed across China which achieved a satisfactory accuracy over 89%. Above all, as the first publicly released 10-m national-scale distribution dataset of China’s ground-mounted PV power stations, it can provide data references for relevant researchers in fields such as energy, land, remote sensing and environmental sciences.

show abstract

Section: Background and Summarymentioning

confidence: 99%

A 10-m national-scale map of ground-mounted photovoltaic power stations in China of 2020

Feng,

Niu,

Ren

et al. 2024

Sci Data

Self Cite

View full text Add to dashboard Cite

show abstract

“…Recent studies employ deep learning techniques, particularly convolutional neural networks (CNN), to automatically learn discriminative features from satellite images. For example, some studies (Chen et al 2022;Fan et al 2022a) classify urban villages by constructing various deep learning models over satellite images and street images. Another study (Fan et al 2022b) classifies urban informal settlements using very high-resolution remote sensing images and timeseries population density data.…”

Section: Related Workmentioning

confidence: 99%

“…In recent years, exploring computer vision techniques with satellite images for urban villages has gained significant attention. Most studies build image classification models to classify whether a given satellite image contains an urban village (Chen et al 2022;Fan et al 2022a,b;Xiao et al 2023) without boundaries identified, while others explore semantic segmentation models to identify urban village boundaries in satellite images (Mast, Wei, and Wurm 2020;Pan et al 2020;Chen et al 2019). However, due to the complex background interference in satellite images and the lack of well-defined boundaries between urban villages and surrounding neighborhoods, existing studies perform poorly in providing accurate urban village boundaries, which further hinders the estimation of the areas and expansions of urban villages (Kirillov et al 2023).…”

Section: Introductionmentioning

confidence: 99%

UV-SAM: Adapting Segment Anything Model for Urban Village Identification

Zhang,

Liu,

Lin

et al. 2024

AAAI

View full text Add to dashboard Cite

Urban villages, defined as informal residential areas in or around urban centers, are characterized by inadequate infrastructures and poor living conditions, closely related to the Sustainable Development Goals (SDGs) on poverty, adequate housing, and sustainable cities. Traditionally, governments heavily depend on field survey methods to monitor the urban villages, which however are time-consuming, labor-intensive, and possibly delayed. Thanks to widely available and timely updated satellite images, recent studies develop computer vision techniques to detect urban villages efficiently. However, existing studies either focus on simple urban village image classification or fail to provide accurate boundary information. To accurately identify urban village boundaries from satellite images, we harness the power of the vision foundation model and adapt the Segment Anything Model (SAM) to urban village segmentation, named UV-SAM. Specifically, UV-SAM first leverages a small-sized semantic segmentation model to produce mixed prompts for urban villages, including mask, bounding box, and image representations, which are then fed into SAM for fine-grained boundary identification. Extensive experimental results on two datasets in China demonstrate that UV-SAM outperforms existing baselines, and identification results over multiple years show that both the number and area of urban villages are decreasing over time, providing deeper insights into the development trends of urban villages and sheds light on the vision foundation models for sustainable cities. The dataset and codes of this study are available at https://github.com/tsinghua-fib-lab/UV-SAM.

show abstract

“…Nonetheless, this bottleneck architecture is within the transformer and not as a single transformer. In the satellite imagery areas, [41] proposed a multimodal fusion architecture using multiple image sources. The modalities features are extracted using LSTM cells and a modified ViT transformer.…”

Section: ) Multimodal Transformers Architecturesmentioning

confidence: 99%

Fusion of Satellite Images and Weather Data With Transformer Networks for Downy Mildew Disease Detection

2023

View full text Add to dashboard Cite

Crop diseases significantly affect the quantity and quality of agricultural production. In a context where the goal of precision agriculture is to minimize or even avoid the use of pesticides, weather and remote sensing data with deep learning can play a pivotal role in detecting crop diseases, allowing localized treatment of crops. However, combining heterogeneous data such as weather and images remains a hot topic and challenging task. Recent developments in transformer architectures have shown the possibility of fusion of data from different domains, such as text-image. The current trend is to custom only one transformer to create a multimodal fusion model. Conversely, we propose a new approach to realize data fusion using three transformers. In this paper, we first solved the missing satellite images problem, by interpolating them with a ConvLSTM model. Then, we proposed a multimodal fusion architecture that jointly learns to process visual and weather information. The architecture is built from three main components, a Vision Transformer and two transformer-encoders, allowing to fuse both image and weather modalities. The results of the proposed method are promising achieving an overall accuracy of 97%.

show abstract

Multi-modal fusion of satellite and street-view images for urban village classification based on a dual-branch deep neural network

Cited by 27 publications

References 34 publications

A 10-m national-scale map of ground-mounted photovoltaic power stations in China of 2020

A 10-m national-scale map of ground-mounted photovoltaic power stations in China of 2020

UV-SAM: Adapting Segment Anything Model for Urban Village Identification

Fusion of Satellite Images and Weather Data With Transformer Networks for Downy Mildew Disease Detection

Contact Info

Product

Resources

About