Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019
DOI: 10.18653/v1/p19-1181
|View full text |Cite
|
Sign up to set email alerts
|

Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation

Abstract: Advances in learning and representations have reinvigorated work that connects language to other modalities. A particularly exciting direction is Vision-and-Language Navigation (VLN), in which agents interpret natural language instructions and visual scenes to move through environments and reach goals. Despite recent progress, current research leaves unclear how much of a role language understanding plays in this task, especially because dominant evaluation metrics have focused on goal completion rather than t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
121
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 118 publications
(122 citation statements)
references
References 24 publications
0
121
0
Order By: Relevance
“…Despite recent progress in the area of vision and language, recent work (Jain et al, 2019) in the navigation task (VLN) argues that current research leaves unclear how much of a role language plays in this task. They point out that dominant evaluation metrics have focused on goal completion rather than how each action contributes to the goal.…”
Section: Previous Workmentioning
confidence: 99%
“…Despite recent progress in the area of vision and language, recent work (Jain et al, 2019) in the navigation task (VLN) argues that current research leaves unclear how much of a role language plays in this task. They point out that dominant evaluation metrics have focused on goal completion rather than how each action contributes to the goal.…”
Section: Previous Workmentioning
confidence: 99%
“…In a study on VLN tasks [ 7 ], a relatively simple deep neural network model of the sequence-to-sequence (Seq2Seq) type was proposed, in which an action sequence was output from two input sequences with input video stream and natural language instructions, respectively. A few other VLN-related studies [ 9 , 15 , 16 ] presented methods to solve the problem of insufficient R2R datasets for training VLN models. They undertook various data augmentation techniques, including the development of a speaker module to generate additional training data [ 9 ], new training data through environment dropout (eliminating selected objects from the environment) [ 15 ], and more sophisticated task data by concatenating the existing R2R data [ 16 ].…”
Section: Related Workmentioning
confidence: 99%
“…A few other VLN-related studies [ 9 , 15 , 16 ] presented methods to solve the problem of insufficient R2R datasets for training VLN models. They undertook various data augmentation techniques, including the development of a speaker module to generate additional training data [ 9 ], new training data through environment dropout (eliminating selected objects from the environment) [ 15 ], and more sophisticated task data by concatenating the existing R2R data [ 16 ].…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations