2015
DOI: 10.48550/arxiv.1506.04089
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences

Abstract: We propose a neural sequence-to-sequence model for direction following, a task that is essential to realizing effective autonomous agents. Our alignment-based encoder-decoder model with long short-term memory recurrent neural networks (LSTM-RNN) translates natural language instructions to action sequences based upon a representation of the observable world state. We introduce a multi-level aligner that empowers our model to focus on sentence "regions" salient to the current world state by using multiple abstra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 54 publications
(10 citation statements)
references
References 27 publications
0
10
0
Order By: Relevance
“…In order to better address the selective generation task, we propose a coarse-tofine aligner that prevents the model from being distracted by non-salient records. Our model aligns based on multiple abstractions of the input: both the original input record as well as the hidden annotations m j = (r j ; h j ) , an approach that has previously been shown to yield better results than aligning based only on the hidden state (Mei et al, 2015).…”
Section: The Modelmentioning
confidence: 99%
“…In order to better address the selective generation task, we propose a coarse-tofine aligner that prevents the model from being distracted by non-salient records. Our model aligns based on multiple abstractions of the input: both the original input record as well as the hidden annotations m j = (r j ; h j ) , an approach that has previously been shown to yield better results than aligning based only on the hidden state (Mei et al, 2015).…”
Section: The Modelmentioning
confidence: 99%
“…This work focuses on exploiting multi-temporal, multi-spectral and spatial information together for improving land cover mapping through the use of RNNs. Recently, RNNs have been demonstrated to achieve significant results on sequential data and have been applied in different fields like natural language processing [30], [34], [20], computer vision [32], [16], [39], multi-modal [22], [11], [15] and robotics [28]. RNNs have been applied on various applications such as language modeling, speech recognition, machine translation, question answering, object recognition, visual tracking, video analysis, image generation, image captioning, video captioning, self driving car, fraud detection, prediction models, sentimental classification, among others.…”
Section: Introductionmentioning
confidence: 99%
“…Our basic model for RUN is a sequence-tosequence model similar to the work of Mei et al (2015) on SAIL, and inspired by Xu et al (2015). It is based on Conditioned Generation with Attention (CGA).…”
Section: Models For Runmentioning
confidence: 99%