Findings of the Association for Computational Linguistics: EMNLP 2020 2020
DOI: 10.18653/v1/2020.findings-emnlp.62
|View full text |Cite
|
Sign up to set email alerts
|

Learning to Stop: A Simple yet Effective Approach to Urban Vision-Language Navigation

Abstract: Vision-and-Language Navigation (VLN) is a natural language grounding task where an agent learns to follow language instructions and navigate to specified destinations in realworld environments. A key challenge is to recognize and stop at the correct location, especially for complicated outdoor environments. Existing methods treat the STOP action equally as other actions, which results in undesirable behaviors that the agent often fails to stop at the destination even though it might be on the right path. There… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
7
2
1

Relationship

1
9

Authors

Journals

citations
Cited by 23 publications
(22 citation statements)
references
References 12 publications
0
22
0
Order By: Relevance
“…Ku et al (2020) reports lower SDTW scores of 21% to 24%. Given this, the TC of 12.8% and SDTW of 1.4% obtained by Retouch-RCONCAT and current best results from Xiang et al (2020) (TC: 19.0%; SDTW: 16.3%), amply demonstrates the challenge of the outdoor navigation problem defined by Touchdown. The greater diversity of the visual environments and the far greater degreesof-freedom for navigation thus provide plenty of headroom for future research.…”
Section: Methodsmentioning
confidence: 81%
“…Ku et al (2020) reports lower SDTW scores of 21% to 24%. Given this, the TC of 12.8% and SDTW of 1.4% obtained by Retouch-RCONCAT and current best results from Xiang et al (2020) (TC: 19.0%; SDTW: 16.3%), amply demonstrates the challenge of the outdoor navigation problem defined by Touchdown. The greater diversity of the visual environments and the far greater degreesof-freedom for navigation thus provide plenty of headroom for future research.…”
Section: Methodsmentioning
confidence: 81%
“…Vision-and-Language Navigation (VLN) is a task that requires an agent to achieve the final goal based on the given instructions in a 3D environment. Besides the generalizability problem studied by previous works (Wang et al, , 2019, the data scarcity problem is another critical issue for the VLN task, expecially in the outdoor environment (Chen et al, 2019;Mehta et al, 2020;Xiang et al, 2020). Fried et al (2018) obtains a broad set of augmented training data for VLN by sampling trajectories in the navigation environment and using the Speaker model to back-translate their instructions.…”
Section: Related Workmentioning
confidence: 99%
“…There are no previous results for multitask SILGNetHack and SymTD as they are introduced here. Though not comparable, the manual stop VisTD SOTA trained using imitation learning on supervised trajectories is 16.7% [56]. State-tracking consistently improves convergence and generalization, even when the correct next step is fully determined by current world observations (e.g.…”
Section: Analyses Of Recent Grounded Language Rl Modelling Contributionsmentioning
confidence: 95%