Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition

Hu, Di; Li, Xuhong; Mou, Lichao; Pu, Jun; Chen, Dong; Jing, Liping; Zhu, Xiaoxiang; Dou, Dejing

doi:10.48550/arxiv.2005.08449

Cited by 3 publications

(2 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Beyond videos, other interesting examples include VQA (see Direction 3, page 8), captioning [27] and audiovisual reasoning, i.e., linking remote sensing images to in-situ audio signals [28]. In the long run, we hope that reasoning Earth observation systems would be capable of deduce clues and make structural inference, in order to explain processes (see direction 5, page 13) and understand causal structures in Earth Systems (see direction 6, page 16).…”

Section: Perspectivesmentioning

confidence: 99%

Toward a Collective Agenda on AI for Earth Science Data Analysis

Tuia¹,

Roscher

Wegner

et al. 2021

IEEE Geosci. Remote Sens. Mag.

Self Cite

View full text Add to dashboard Cite

This is the pre-acceptance version, to read the final version published in the Geoscience and Remote Sensing Magazine, please go to: 10.1109/MGRS.2020.3043504 In the last years we have witnessed the fields of geosciences and remote sensing and artificial intelligence to become closer. Thanks to both the massive availability of observational data, improved simulations, and algorithmic advances, these disciplines have found common objectives and challenges to advance the modeling and understanding of the Earth system. Despite such great opportunities, we also observed a worrying tendency to remain in disciplinary comfort zones applying recent advances from artificial intelligence on well resolved remote sensing problems. Here we take a position on research directions where we think the interface between these fields will have the most impact and become potential game changers. In our declared agenda for AI on Earth sciences, we aim to inspire researchers, especially the younger generations, to tackle these challenges for a real advance of remote sensing and the geosciences.

show abstract

Section: Perspectivesmentioning

confidence: 99%

Toward a Collective Agenda on AI for Earth Science Data Analysis

Tuia¹,

Roscher

Wegner

et al. 2021

IEEE Geosci. Remote Sens. Mag.

Self Cite

View full text Add to dashboard Cite

show abstract

“…In recent years, many efforts [19], e.g., developing novel network architectures [20,21,22,23,24,25] and pipelines [26,27,28,29], publishing large-scale datasets [30,31], introducing multi-modal and multi-temporal data [32,33,34,35], have been deployed to address this task, and most of them treat it as a single-label classification problem. A common assumption shared by these researches is that an aerial image belongs to only one scene category, while in real-world scenarios, it is more often that there exist various scenes in a single image (cf.…”

Section: Introductionmentioning

confidence: 99%

Aerial Scene Understanding in The Wild: Multi-Scene Recognition via Prototype-based Memory Networks

Hua

Moua

Lin

et al. 2021

Preprint

View full text Add to dashboard Cite

This is a preprint. To read the final version please visit ISPRS Journal of Photogrammetry and Remote Sensing. Aerial scene recognition is a fundamental visual task and has attracted an increasing research interest in the last few years. Most of current researches mainly deploy efforts to categorize an aerial image into one scene-level label, while in real-world scenarios, there often exist multiple scenes in a single image. Therefore, in this paper, we propose to take a step forward to a more practical and challenging task, namely multi-scene recognition in single images. Moreover, we note that manually yielding annotations for such a task is extraordinarily time-and labor-consuming. To address this, we propose a prototype-based memory network to recognize multiple scenes in a single image by leveraging massive well-annotated single-scene images. The proposed network consists of three key components: 1) a prototype learning module, 2) a prototype-inhabiting external memory, and 3) a multi-head attention-based memory retrieval module. To be more specific, we first learn the prototype representation of each aerial scene from single-scene aerial image datasets and store it in an external memory. Afterwards, a multi-head attention-based memory retrieval module

show abstract