Proceedings of the Third International Workshop on Spatial Language Understanding 2020
DOI: 10.18653/v1/2020.splu-1.7
|View full text |Cite
|
Sign up to set email alerts
|

Retouchdown: Releasing Touchdown on StreetLearn as a Public Resource for Language Grounding Tasks in Street View

Abstract: The Touchdown dataset (Chen et al., 2019) provides instructions by human annotators for navigation through New York City streets and for resolving spatial descriptions at a given location. To enable the wider research community to work effectively with the Touchdown tasks, we are publicly releasing the 29k raw Street View panoramas needed for Touchdown. We follow the process used for the StreetLearn data release (Mirowski et al., 2019) to check panoramas for personally identifiable information and blur them a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
39
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 28 publications
(40 citation statements)
references
References 11 publications
1
39
0
Order By: Relevance
“…A number of VLN datasets situated in photorealistic 3D reconstructions of real locations contain human instructions or dialogue: R2R (Anderson et al, 2018b), Touchdown (Chen et al, 2019;Mehta et al, 2020), CVDN (Thomason et al, 2019b) and REVERIE . RxR addresses shortcomings of these datasetsin particular, multilinguality, scale, fine-grained word grounding, and human follower demonstrations (Table 1).…”
Section: Motivationmentioning
confidence: 99%
See 1 more Smart Citation
“…A number of VLN datasets situated in photorealistic 3D reconstructions of real locations contain human instructions or dialogue: R2R (Anderson et al, 2018b), Touchdown (Chen et al, 2019;Mehta et al, 2020), CVDN (Thomason et al, 2019b) and REVERIE . RxR addresses shortcomings of these datasetsin particular, multilinguality, scale, fine-grained word grounding, and human follower demonstrations (Table 1).…”
Section: Motivationmentioning
confidence: 99%
“…Vision-and-Language Navigation (VLN) tasks require computational agents to mediate the relationship between language, visual scenes and movement. Datasets have been collected for both indoor (Anderson et al, 2018b;Thomason et al, 2019b; and outdoor (Chen et al, 2019;Mehta et al, 2020) environments; success in these is based on clearly-defined, objective task completion rather than language or vision specific annotations. These VLN tasks fall in the Goldilocks zone: they can be tackled -but not solved -with current methods, and progress on them makes headway on real world grounded language understanding.…”
Section: Introductionmentioning
confidence: 99%
“…Embodied Language Tasks. A number of 'Embodied AI' tasks combining language, visual perception, and navigation in realistic 3D environments have recently gained prominence, including Interactive and Embodied Question Answering (Das et al, 2018;Gordon et al, 2018), Vision-and-Language Navigation or VLN (Anderson et al, 2018;Chen et al, 2019;Mehta et al, 2020;Qi et al, 2020), and challenges based on household tasks (Puig et al, 2018;Shridhar et al, 2020). While these tasks utilize only a single question or instruction input, several papers have extended the VLN task -in which an agent must follow natural language instructions to traverse a path in the environment -to dialog settings.…”
Section: Related Workmentioning
confidence: 99%
“…We use the same experimental setup in Touchdown-SDR using the scenes provided in the concurrent work (Mehta et al, 2020), where we slice 360scene into 8 FoVs covering the scene. We pass each of these FoVs to a pre-trained model (He et al, 2016), and extract features from fourth to the last layer before classification to get a feature map representation of the FoVs.…”
Section: Localization Experimentsmentioning
confidence: 99%