2020
DOI: 10.48550/arxiv.2005.08449
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 29 publications
0
2
0
Order By: Relevance
“…Beyond videos, other interesting examples include VQA (see Direction 3, page 8), captioning [27] and audiovisual reasoning, i.e., linking remote sensing images to in-situ audio signals [28]. In the long run, we hope that reasoning Earth observation systems would be capable of deduce clues and make structural inference, in order to explain processes (see direction 5, page 13) and understand causal structures in Earth Systems (see direction 6, page 16).…”
Section: Perspectivesmentioning
confidence: 99%
“…Beyond videos, other interesting examples include VQA (see Direction 3, page 8), captioning [27] and audiovisual reasoning, i.e., linking remote sensing images to in-situ audio signals [28]. In the long run, we hope that reasoning Earth observation systems would be capable of deduce clues and make structural inference, in order to explain processes (see direction 5, page 13) and understand causal structures in Earth Systems (see direction 6, page 16).…”
Section: Perspectivesmentioning
confidence: 99%
“…In recent years, many efforts [19], e.g., developing novel network architectures [20,21,22,23,24,25] and pipelines [26,27,28,29], publishing large-scale datasets [30,31], introducing multi-modal and multi-temporal data [32,33,34,35], have been deployed to address this task, and most of them treat it as a single-label classification problem. A common assumption shared by these researches is that an aerial image belongs to only one scene category, while in real-world scenarios, it is more often that there exist various scenes in a single image (cf.…”
Section: Introductionmentioning
confidence: 99%