Proceedings of the 2019 Conference of the North 2019
DOI: 10.18653/v1/n19-1197
|View full text |Cite
|
Sign up to set email alerts
|

Shifting the Baseline: Single Modality Performance on Visual Navigation &

Abstract: We demonstrate the surprising strength of unimodal baselines in multimodal domains, and make concrete recommendations for best practices in future research. Where existing work often compares against random or majority class baselines, we argue that unimodal approaches better capture and reflect dataset biases and therefore provide an important comparison when assessing the performance of multimodal techniques. We present unimodal ablations on three recent datasets in visual navigation and QA, seeing an up to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
47
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 64 publications
(50 citation statements)
references
References 25 publications
3
47
0
Order By: Relevance
“…R2R paths span 4-6 edges and are the shortest paths from start to goal. Thomason et al (2019a) showed that agents can exploit effective priors over R2R paths, and showed that R2R paths encourage goal seeking Figure 2: Given the panorama navigation graph P with room graph R in Figure 2a, we sample a simple room path (r 0 , r 2 , r 3 ) inducing the subgraph in Figure 2b.…”
Section: Motivationmentioning
confidence: 99%
See 1 more Smart Citation
“…R2R paths span 4-6 edges and are the shortest paths from start to goal. Thomason et al (2019a) showed that agents can exploit effective priors over R2R paths, and showed that R2R paths encourage goal seeking Figure 2: Given the panorama navigation graph P with room graph R in Figure 2a, we sample a simple room path (r 0 , r 2 , r 3 ) inducing the subgraph in Figure 2b.…”
Section: Motivationmentioning
confidence: 99%
“…Unimodal Ablations Table 7 reports the performance of the multilingual agent under settings in which we ablate either the vision or the language inputs during both training and evaluation, as advocated by Thomason et al (2019a). The multimodal agent (4) outperforms both the languageonly agent (9) and the vision-only agent (10), indicating that both modalities contribute to performance.…”
Section: Multitask and Transfer Learning Tablementioning
confidence: 99%
“…There have also been concerns about structural biases present in these datasets which may provide hidden shortcuts to agents training on these problems. Thomason et al (2019) presented an analysis on R2R dataset, where the trained agent continued to perform surprisingly well in the absence of language inputs.…”
Section: Room-to-room (R2r)mentioning
confidence: 99%
“…Biases in VQA datasets A growing body of work points to the existence of biases in popular VQA datasets (Agrawal et al, 2016;Zhang et al, 2016;Jabri et al, 2016;Goyal et al, 2017;Johnson et al, 2017;Chao et al, 2018;Thomason et al, 2019). In VQA v1 (Antol et al, 2015), for instance, for questions of the form, "What sport is...?…”
Section: Related Workmentioning
confidence: 99%