2022
DOI: 10.1609/aaai.v36i2.20097
|View full text |Cite
|
Sign up to set email alerts
|

TEACh: Task-Driven Embodied Agents That Chat

Abstract: Robots operating in human spaces must be able to engage in natural language interaction, both understanding and executing instructions, and using conversation to resolve ambiguity and correct mistakes. To study this, we introduce TEACh, a dataset of over 3,000 human-human, interactive dialogues to complete household tasks in simulation. A Commander with access to oracle information about a task communicates in natural language with a Follower. The Follower navigates through and interacts with the environment t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 39 publications
(25 citation statements)
references
References 31 publications
0
12
0
Order By: Relevance
“…Embodied AI. The development of learning-based embodied AI agents has made significant progress across a wide variety of tasks, including: scene rearrangement [3,17,38], object-goal navigation [1,6,8,19,41,43], point-goal navigation [1,19,30,31,40], scene exploration [7,10], embodied question answering [12,18], instructional navigation [2,35], object manipulation [14,44], home task completion with explicit instructions [27,35,36], active visual learning [9,15,20,39], and collaborative task completion with agent-human conversations [29]. While these works have driven much progress in embodied AI, ours is the first agent to tackle the task of tidying up rooms, which requires commonsense reasoning about whether or not an object is out of place, and inferring where it belongs in the context of the room.…”
Section: Related Workmentioning
confidence: 99%
“…Embodied AI. The development of learning-based embodied AI agents has made significant progress across a wide variety of tasks, including: scene rearrangement [3,17,38], object-goal navigation [1,6,8,19,41,43], point-goal navigation [1,19,30,31,40], scene exploration [7,10], embodied question answering [12,18], instructional navigation [2,35], object manipulation [14,44], home task completion with explicit instructions [27,35,36], active visual learning [9,15,20,39], and collaborative task completion with agent-human conversations [29]. While these works have driven much progress in embodied AI, ours is the first agent to tackle the task of tidying up rooms, which requires commonsense reasoning about whether or not an object is out of place, and inferring where it belongs in the context of the room.…”
Section: Related Workmentioning
confidence: 99%
“…Vision-and-Language Navigation. Training embodied navigation agents has been an increasingly active research area (Anderson et al, 2018a,b;Chen et al, 2019;Ku et al, 2020;Shridhar et al, 2020;Padmakumar et al, 2022). Fried et al (2018b) propose to augment the training data with the speaker-follower models, which is improve by Tan et al ( 2019) who add noise into the environments so that the speaker can generate more diverse instructions.…”
Section: Related Workmentioning
confidence: 99%
“…Researchers in the Interaction Lab have shown that previous work on so-called 'Visual Dialog' does not really require taking dialogue context into account, and proposed new visual dialogue datasets where linguistic context matters [3]. We are currently working to further develop interactive systems for learning grounded language, for example within the 2022 Amazon Alexa SimBot challenge [47,63]. Fig.…”
Section: Vision and Languagementioning
confidence: 99%
“…More recently, researchers in the Interaction Lab have developed deep learning systems such as 'Embodied BERT' (EmBERT) [62] which combine video streams and language to learn grounded language and action execution. Related to this work, we are currently the only European team participating in the Amazon Alexa SimBot challenge 9 (2022) which works on the TEACh dataset [47] of videos combined with conversations about household tasks (see Fig. 3).…”
Section: Embodied Interactionmentioning
confidence: 99%