On Evaluation of Embodied Navigation Agents

Anderson, Peter; Chang, Angel; Chaplot, Devendra Singh; Dosovitskiy, Alexey; Gupta, Saurabh; Koltun, Vladlen; Kosecka, Jana; Malik, Jitendra; Mottaghi, Roozbeh; Savva, Manolis; Zamir, Amir R.

doi:10.48550/arxiv.1807.06757

Cited by 188 publications

(383 citation statements)

References 0 publications

Supporting

Mentioning

382

Contrasting

Order By: Relevance

“…While deep learning has demonstrated great success in various application domains [Russakovsky et al, 2015, Silver et al, 2016, large-scale annotated data for supervision inevitably becomes the bottleneck. Many works thus explore self-supervised learning via active perception [Wilkes & Tsotsos, 1992], interactive perception [Bohg et al, 2017], or interactive exploration [Wyatt et al, 2011] to learn visual representations [Fang et al, 2020, Jayaraman & Grauman, 2018, Weihs et al, 2019, Zakka et al, 2020, objects and poses [Caicedo & Lazebnik, 2015, Chaplot et al, 2020b, Choi et al, 2021, segmentation and parts [Eitel et al, 2019, Gadre et al, 2021, Katz & Brock, 2008, Kenney et al, 2009, Lohmann et al, 2020, Pathak et al, 2018, Van Hoof et al, 2014, physics and dynamics [Agrawal et al, 2016, Ehsani et al, 2020, Janner et al, 2018, Li et al, 2016, Lohmann et al, 2020, Mottaghi et al, 2016, Wu et al, 2015, manipulation skills [Agrawal et al, 2016, Batra et al, 2020, Zeng et al, 2018, navigation policies [Anderson et al, 2018, Chaplot et al, 2020a, Ramakrishnan et al, 2021, etc. In this work, we design interactive policies to explore novel 3D indoor rooms and learn our newly proposed inter-object functional relationships.…”

Section: Related Workmentioning

confidence: 99%

“…An agent is provided with large-scale scenes to explore for learning in the training stage and is asked to predict the functional scene graph (S, R S ) for a novel scene at the test time. We also abstract away complexities on robotic navigation [Anderson et al, 2018, Ramakrishnan et al, 2021 and manipulation [Mo et al, 2021a, which are orthogonal to our contribution to estimating inter-object functional relationships.…”

Section: Problem Formulationmentioning

confidence: 99%

See 1 more Smart Citation

IFR-Explore: Learning Inter-object Functional Relationships in 3D Indoor Scenes

Li¹,

Mo²,

Yang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Building embodied intelligent agents that can interact with 3D indoor environments has received increasing research attention in recent years. While most works focus on single-object or agent-object visual functionality and affordances, our work proposes to study a new kind of visual relationship that is also important to perceive and model -inter-object functional relationships (e.g., a switch on the wall turns on or off the light, a remote control operates the TV). Humans often spend little or no effort to infer these relationships, even when entering a new room, by using our strong prior knowledge (e.g., we know that buttons control electrical devices) or using only a few exploratory interactions in cases of uncertainty (e.g., multiple switches and lights in the same room). In this paper, we take the first step in building AI system learning inter-object functional relationships in 3D indoor environments with key technical contributions of modeling prior knowledge by training over large-scale scenes and designing interactive policies for effectively exploring the training scenes and quickly adapting to novel test scenes. We create a new benchmark based on the AI2Thor and PartNet datasets and perform extensive experiments that prove the effectiveness of our proposed method. Results show that our model successfully learns priors and fast-interactive-adaptation strategies for exploring inter-object functional relationships in complex 3D scenes. Several ablation studies further validate the usefulness of each proposed module.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Problem Formulationmentioning

confidence: 99%

IFR-Explore: Learning Inter-object Functional Relationships in 3D Indoor Scenes

Li¹,

Mo²,

Yang³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Meanwhile, to avoid collisions, the works [10,18] imply an intrinsic collision reward with an additional collision detector module. Some works [1,9,48] introduce more information from the environment to improve navigation performance. For example, Chen et al [9] utilize additional topological guidance of scenes for visual navigation.…”

Section: Visual Navigation With Rlmentioning

confidence: 99%

“…For example, Chen et al [9] utilize additional topological guidance of scenes for visual navigation. In [1,13,26,36], natural language instructions are introduced to guide the agent. Tang et al [52] introduce an auto-navigator to design a specialized network for visual navigation.…”

Section: Visual Navigation With Rlmentioning

confidence: 99%

See 1 more Smart Citation

Agent-Centric Relation Graph for Object Visual Navigation

Hu¹,

Wu²,

Lv³

et al. 2021

Preprint

View full text Add to dashboard Cite

Object visual navigation aims to steer an agent towards a target object based on visual observations of the agent. It is highly desirable to reasonably perceive the environment and accurately control the agent. In the navigation task, we introduce an Agent-Centric Relation Graph (ACRG) for learning the visual representation based on the relationships in the environment. ACRG is a highly effective and reasonable structure that consists of two relationships, i.e., the relationship among objects and the relationship between the agent and the target. On the one hand, we design the Object Horizontal Relationship Graph (OHRG) that stores the relative horizontal location among objects. Note that the vertical relationship is not involved in OHRG, and we argue that OHRG is suitable for the control strategy. On the other hand, we propose the Agent-Target Depth Relationship Graph (ATDRG) that enables the agent to perceive the distance to the target. To achieve ATDRG, we utilize image depth to represent the distance. Given the above relationships, the agent can perceive the environment and output navigation actions. Given the visual representations constructed by ACRG and position-encoded global features, the agent can capture the target position to perform navigation actions. Experimental results in the artificial environment AI2-Thor demonstrate that ACRG significantly outperforms other state-of-the-art methods in unseen testing environments.

show abstract

A Cordial Sync: Going Beyond Marginal Policies for Multi-agent Embodied Tasks

Jain

Weihs

Kolve

et al. 2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Autonomous agents must learn to collaborate. It is not scalable to develop a new centralized agent every time a task's difficulty outpaces a single agent's abilities. While multi-agent collaboration research has flourished in gridworld-like environments, relatively little work has considered visually rich domains. Addressing this, we introduce the novel task FurnMove in which agents work together to move a piece of furniture through a living room to a goal. Unlike existing tasks, FurnMove requires agents to coordinate at every timestep. We identify two challenges when training agents to complete FurnMove: existing decentralized action sampling procedures do not permit expressive joint action policies and, in tasks requiring close coordination, the number of failed actions dominates successful actions. To confront these challenges we introduce SYNC-policies (synchronize your actions coherently) and CORDIAL (coordination loss). Using SYNC-policies and CORDIAL, our agents achieve a 58% completion rate on FurnMove, an impressive absolute gain of 25 percentage points over competitive decentralized baselines. Our dataset, code, and pretrained models are available at https://unnat.github.io/cordial-sync.

show abstract

On Evaluation of Embodied Navigation Agents

Cited by 188 publications

References 0 publications

IFR-Explore: Learning Inter-object Functional Relationships in 3D Indoor Scenes

IFR-Explore: Learning Inter-object Functional Relationships in 3D Indoor Scenes

Agent-Centric Relation Graph for Object Visual Navigation

A Cordial Sync: Going Beyond Marginal Policies for Multi-agent Embodied Tasks

Contact Info

Product

Resources

About