This paper is the arXiv version of the paper that appears in the proceedings of EMNLP 2019. The content of the main paper is the exactly same as in the proceedings (modulo citation updates). However, the evaluation method used to obtain the results in the main paper unfortunately induces non-deterministic agent behavior, which makes comparisons difficult. We provide additional results herein obtained via a deterministic evaluation scheme in Appendix G. All conclusions and qualitative claims made in the main paper are unaffected by this change of evaluation scheme, and still hold on the new results. We strongly recommend future work reference results in Appendix G when comparing with our methods.
AbstractMobile agents that can leverage help from humans can potentially accomplish more complex tasks than they could entirely on their own. We develop "Help, Anna!" (HANNA), an interactive photo-realistic simulator in which an agent fulfills object-finding tasks by requesting and interpreting natural languageand-vision assistance. An agent solving tasks in a HANNA environment can leverage simulated human assistants, called ANNA (Automatic Natural Navigation Assistants), which, upon request, provide natural language and visual instructions to direct the agent towards the goals. To address the HANNA problem, we develop a memory-augmented neural agent that hierarchically models multiple levels of decision-making, and an imitation learning algorithm that teaches the agent to avoid repeating past mistakes while simultaneously predicting its own chances of making future progress. Empirically, our approach is able to ask for help more effectively than competitive baselines and, thus, attains higher task success rate on both previously seen and previously unseen environments. We publicly release code and data at https://github. com/khanhptnk/hanna . , et al. 2018a. On evaluation of embodied navigation agents. arXiv preprint arXiv:1807.06757. van den Hengel. 2018b. Visionand-language navigation: Interpreting visuallygrounded navigation instructions in real environments. In She. 2016. Collaborative language grounding toward situated human-robot dialogue. AI Magazine, 37(4):32-45.