DialFRED: Dialogue-Enabled Agents for Embodied Instruction Following

Gao, Xiaofeng; Gao, Qiaozi; Gong, Ran; Lin, Kaixiang; Thattai, Govind; Sukhatme, Gaurav S.

doi:10.48550/arxiv.2202.13330

Cited by 4 publications

(17 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Meanwhile, for tasks (Thomason et al, 2019b;Padmakumar et al, 2021) that do not provide an oracle agent to answer question in natural language, researchers also need to build a rule-based (Padmakumar et al, 2021) or neural-based (Roman et al, 2020) oracle. Dial-FRED (Gao et al, 2022) uses a language model as an oracle to answer questions.…”

Section: Asking For Helpmentioning

confidence: 99%

“…Language-Active Environment Room-to-Room (Anderson et al, 2018b) Matterport3D ✗ Indoor Room-for-Room Matterport3D ✗ Indoor Room-Across-Room (Ku et al, 2020) Matterport3D ✗ Indoor Landmark-RxR (He et al, 2021) Matterport3D ✗ Indoor XL-R2R (Yan et al, 2020) Matterport3D ✗ Indoor VLNCE (Krantz et al, 2020) Habitat ✗ Indoor StreetLearn Google Street View ✗ Outdoor StreetNav (Hermann et al, 2020) Google Street View ✗ Outdoor TOUCHDOWN Google Street View ✗ Outdoor Talk2Nav (Vasudevan et al, 2021) Google Street View ✗ Outdoor LANI (Misra et al, 2018) -✗ Outdoor RoomNav (Wu et al, 2018) House3D ✗ Indoor EmbodiedQA (Das et al, 2018) House3D ✗ Indoor REVERIE Matterport3D ✗ Indoor SOON (Zhu et al, 2021a) Matterport3D ✗ Indoor IQA (Gordon et al, 2018) AI2-THOR ✗ Indoor CHAI (Misra et al, 2018) CHALET ✗ Indoor ALFRED (Shridhar et al, 2020) AI2-THOR ✗ Indoor VNLA Matterport3D ✓ Indoor HANNA (Nguyen and Daumé III, 2019) Matterport3D ✓ Indoor CEREALBAR -✓ Indoor Just Ask (Chi et al, 2020) Matterport3D ✓ Indoor CVDN (Thomason et al, 2019b) Matterport3D ✓ Indoor RobotSlang (Banerjee et al, 2020) -✓ Indoor Talk the Walk (de Vries et al, 2018) -✓ Outdoor MC Collab (Narayan- Minecraft ✓ Outdoor TEACh (Padmakumar et al, 2021) AI2-THOR ✓ Indoor DialFRED (Gao et al, 2022) AI2-THOR ✓ Indoor…”

Section: Namementioning

confidence: 99%

See 1 more Smart Citation

Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

Gu¹,

Stefani²,

Wu³

et al. 2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

A long-term goal of AI research is to build intelligent agents that can communicate with humans in natural language, perceive the environment, and perform real-world tasks. Visionand-Language Navigation (VLN) is a fundamental and interdisciplinary research topic towards this goal, and receives increasing attention from natural language processing, computer vision, robotics, and machine learning communities. In this paper, we review contemporary studies in the emerging field of VLN, covering tasks, evaluation metrics, methods, etc. Through structured analysis of current progress and challenges, we highlight the limitations of current VLN and opportunities for future work. This paper serves as a thorough reference for the VLN research community. 1

show abstract

Section: Asking For Helpmentioning

confidence: 99%

Section: Namementioning

confidence: 99%

Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

Gu¹,

Stefani²,

Wu³

et al. 2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

show abstract

“…The follower converses with the commander and interacts with the environment to complete various house tasks such as making coffee. Dial-FRED (Gao et al, 2022) extends ALFRED (Shridhar et al, 2020) dataset by allowing the agent to actively ask questions.…”

Section: Human Dialoguementioning

confidence: 99%

“…Meanwhile, for tasks (Thomason et al, 2019b;Padmakumar et al, 2021) that do not provide an oracle agent to answer question in natural language, researchers also need to build a rule-based (Padmakumar et al, 2021) or neural-based (Roman et al, 2020 oracle. DialFRED (Gao et al, 2022) uses a language model as an oracle to answer questions.…”

Section: Asking For Helpmentioning

confidence: 99%

“…Language-Active Environment Room-to-Room (Anderson et al, 2018b) Matterport3D Indoor Room-for-Room Matterport3D Indoor Room-Across-Room (Ku et al, 2020) Matterport3D Indoor Landmark-RxR (He et al, 2021) Matterport3D Indoor XL-R2R (Yan et al, 2020) Matterport3D Indoor VLNCE (Krantz et al, 2020) Habitat Indoor StreetLearn Google Street View Outdoor StreetNav (Hermann et al, 2020) Google Street View Outdoor TOUCHDOWN Google Street View Outdoor Talk2Nav (Vasudevan et al, 2021) Google Street View Outdoor LANI (Misra et al, 2018) -Outdoor RoomNav (Wu et al, 2018) House3D Indoor EmbodiedQA (Das et al, 2018) House3D Indoor REVERIE Matterport3D Indoor SOON (Zhu et al, 2021a) Matterport3D Indoor IQA (Gordon et al, 2018) AI2-THOR Indoor CHAI (Misra et al, 2018) CHALET Indoor ALFRED (Shridhar et al, 2020) AI2-THOR Indoor VNLA Matterport3D Indoor HANNA (Nguyen and Daumé III, 2019) Matterport3D Indoor CEREALBAR -Indoor Just Ask (Chi et al, 2020) Matterport3D Indoor CVDN (Thomason et al, 2019b) Matterport3D Indoor RobotSlang (Banerjee et al, 2020) -Indoor Talk the Walk (de Vries et al, 2018) -Outdoor MC Collab (Narayan- Minecraft Outdoor TEACh (Padmakumar et al, 2021) AI2-THOR Indoor DialFRED (Gao et al, 2022) AI2-THOR Indoor ) Habitat (Manolis Savva* et al, 2019 AI2-THOR Gibson (Xia et al, 2018) LANI (Misra et al, 2018) *Google Street View…”

Section: Name Simulatormentioning

confidence: 99%

See 1 more Smart Citation

Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

Gu,

Stefani,

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

DialFRED: Dialogue-Enabled Agents for Embodied Instruction Following

Gao

Gong

et al. 2022

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

We introduce Alexa Arena, a user-centric simulation platform for Embodied AI (EAI) research. Alexa Arena provides a variety of multi-room layouts and interactable objects, for the creation of human-robot interaction (HRI) missions. With user-friendly graphics and control mechanisms, Alexa Arena supports the development of gamified robotic tasks readily accessible to general human users, thus opening a new venue for highefficiency HRI data collection and EAI system evaluation. Along with the platform, we introduce a dialog-enabled instruction-following benchmark and provide baseline results for it. We make Alexa Arena 1 publicly available to facilitate research in building generalizable and assistive embodied agents.

show abstract

DialFRED: Dialogue-Enabled Agents for Embodied Instruction Following

Cited by 4 publications

References 28 publications

Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

DialFRED: Dialogue-Enabled Agents for Embodied Instruction Following

Contact Info

Product

Resources

About