Diagnosing Vision-and-Language Navigation: What Really Matters

Zhu, Wanrong; Qi, Yuankai; Narayana, Pradyumna; Sone, Kazoo; Basu, Sugato; Wang, Xin Eric; Wu, Qi; Eckstein, Miguel P.; Wang, William Yang

doi:10.18653/v1/2022.naacl-main.438

Cited by 26 publications

(12 citation statements)

References 55 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While task completion significantly drops when direction tokens are masked, the agent still performs on a high level. This finding is surprising and in dissent with Zhu et al (2021a) who report that task completion nearly drops to zero when masking direction tokens during testing only. We believe that in our setting (masking during testing and training), the model learns to infer the correct directions from redundancies in the instructions or context around the direction tokens.…”

Section: Token Maskingmentioning

confidence: 57%

“…To analyze the importance of direction and object tokens in the navigation instructions, we run masking experiments similar to Zhu et al (2021a), except that we mask the tokens during training and testing instead of during testing only. Figure 4 shows the resulting task completion rates for an increasing number of masked direction or object tokens.…”

Section: Token Maskingmentioning

confidence: 99%

“…Agents in this outdoor environment are trained to follow human written navigation instructions (Chen et al, 2019;Xiang et al, 2020), instructions generated by Google Maps (Hermann et al, 2020), or a combination of both (Zhu et al, 2021b). Recent work focuses on analyzing the navigation agents by introducing better trajectory overlap metrics Ilharco et al, 2019) or diagnosing the performance under certain constraints such as uni-modal inputs (Thomason et al, 2019) and masking direction or object tokens (Zhu et al, 2021a). Other work used a trained VLN agent to evaluate automatically generated navigation instructions (Zhao et al, 2021).…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Analyzing Generalization of Vision and Language Navigation to Unseen Outdoor Areas

Schumann¹,

Riezler²

2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

Vision and language navigation (VLN) is a challenging visually-grounded language understanding task. Given a natural language navigation instruction, a visual agent interacts with a graph-based environment equipped with panorama images and tries to follow the described route. Most prior work has been conducted in indoor scenarios where best results were obtained for navigation on routes that are similar to the training routes, with sharp drops in performance when testing on unseen environments. We focus on VLN in outdoor scenarios and find that in contrast to indoor VLN, most of the gain in outdoor VLN on unseen data is due to features like junction type embedding or heading delta that are specific to the respective environment graph, while image information plays a very minor role in generalizing VLN to unseen outdoor areas. These findings show a bias to specifics of graph representations of urban environments, demanding that VLN tasks grow in scale and diversity of geographical environments. 1

show abstract

Section: Token Maskingmentioning

confidence: 57%

Section: Token Maskingmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Analyzing Generalization of Vision and Language Navigation to Unseen Outdoor Areas

Schumann¹,

Riezler²

2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

show abstract

“…High-level features such as visual appearance, route structure, and detected objects outperform the low level visual features extracted by CNN (Hu et al, 2019). Different types of tokens within the instruction also function differently (Zhu et al, 2021b). Extracting these tokens and encoding the object tokens and directions tokens are crucial Zhu et al, 2021b).…”

Section: Semantic Understandingmentioning

confidence: 99%

“…Different types of tokens within the instruction also function differently (Zhu et al, 2021b). Extracting these tokens and encoding the object tokens and directions tokens are crucial Zhu et al, 2021b).…”

Section: Semantic Understandingmentioning

confidence: 99%

Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

Gu¹,

Stefani²,

Wu³

et al. 2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Self Cite

View full text Add to dashboard Cite

A long-term goal of AI research is to build intelligent agents that can communicate with humans in natural language, perceive the environment, and perform real-world tasks. Visionand-Language Navigation (VLN) is a fundamental and interdisciplinary research topic towards this goal, and receives increasing attention from natural language processing, computer vision, robotics, and machine learning communities. In this paper, we review contemporary studies in the emerging field of VLN, covering tasks, evaluation metrics, methods, etc. Through structured analysis of current progress and challenges, we highlight the limitations of current VLN and opportunities for future work. This paper serves as a thorough reference for the VLN research community. 1

show abstract

Vision-language navigation: a survey and taxonomy

Wu,

Chang,

et al. 2023

Neural Comput & Applic

View full text Add to dashboard Cite

Diagnosing Vision-and-Language Navigation: What Really Matters

Cited by 26 publications

References 55 publications

Analyzing Generalization of Vision and Language Navigation to Unseen Outdoor Areas

Analyzing Generalization of Vision and Language Navigation to Unseen Outdoor Areas

Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

Vision-language navigation: a survey and taxonomy

Contact Info

Product

Resources

About