2022
DOI: 10.1016/j.jag.2022.102794
|View full text |Cite
|
Sign up to set email alerts
|

Multi-modal fusion of satellite and street-view images for urban village classification based on a dual-branch deep neural network

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
1

Relationship

2
7

Authors

Journals

citations
Cited by 27 publications
(8 citation statements)
references
References 34 publications
0
5
0
Order By: Relevance
“…Although deep learning has been popular nowadays, the training of a deep neural network needs a huge number of labelled samples 19 . Otherwise, the deep learning model would be easily overfitted on limited training samples and show poor performance when predicting the new unseen datasets.…”
Section: Background and Summarymentioning
confidence: 99%
“…Although deep learning has been popular nowadays, the training of a deep neural network needs a huge number of labelled samples 19 . Otherwise, the deep learning model would be easily overfitted on limited training samples and show poor performance when predicting the new unseen datasets.…”
Section: Background and Summarymentioning
confidence: 99%
“…Recent studies employ deep learning techniques, particularly convolutional neural networks (CNN), to automatically learn discriminative features from satellite images. For example, some studies (Chen et al 2022;Fan et al 2022a) classify urban villages by constructing various deep learning models over satellite images and street images. Another study (Fan et al 2022b) classifies urban informal settlements using very high-resolution remote sensing images and timeseries population density data.…”
Section: Related Workmentioning
confidence: 99%
“…In recent years, exploring computer vision techniques with satellite images for urban villages has gained significant attention. Most studies build image classification models to classify whether a given satellite image contains an urban village (Chen et al 2022;Fan et al 2022a,b;Xiao et al 2023) without boundaries identified, while others explore semantic segmentation models to identify urban village boundaries in satellite images (Mast, Wei, and Wurm 2020;Pan et al 2020;Chen et al 2019). However, due to the complex background interference in satellite images and the lack of well-defined boundaries between urban villages and surrounding neighborhoods, existing studies perform poorly in providing accurate urban village boundaries, which further hinders the estimation of the areas and expansions of urban villages (Kirillov et al 2023).…”
Section: Introductionmentioning
confidence: 99%
“…Nonetheless, this bottleneck architecture is within the transformer and not as a single transformer. In the satellite imagery areas, [41] proposed a multimodal fusion architecture using multiple image sources. The modalities features are extracted using LSTM cells and a modified ViT transformer.…”
Section: ) Multimodal Transformers Architecturesmentioning
confidence: 99%