Visual Semantic Planning Using Deep Successor Representations

Zhu, Yuke; Gordon, Daniel; Kolve, Eric; Fox, Dieter; Li, Feifei; Gupta, Abhinav; Mottaghi, Roozbeh; Farhadi, Ali

doi:10.1109/iccv.2017.60

Cited by 128 publications

(113 citation statements)

References 36 publications

Supporting

Mentioning

110

Contrasting

Order By: Relevance

“…Gandhi et al [19] collect a dataset of drone crashes and train self-supervised agents to avoid obstacles. A number of new challenging tasks have been proposed including instruction-based navigation [6,7], target-driven navigation [2,4], embodied/interactive question answering [1,9], and task planning [5].…”

Section: Related Workmentioning

confidence: 99%

Embodied Question Answering in Photorealistic Environments With Point Cloud Perception

Wijmans

Datta

Maksymets

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

121

107

View full text Add to dashboard Cite

To help bridge the gap between internet vision-style problems and the goal of vision for embodied perception we instantiate a large-scale navigation task -Embodied Question Answering [1] in photo-realistic environments (Matterport 3D). We thoroughly study navigation policies that utilize 3D point clouds, RGB images, or their combination. Our analysis of these models reveals several key findings. We find that two seemingly naive navigation baselines, forward-only and random, are strong navigators and challenging to outperform, due to the specific choice of the evaluation setting presented by [1]. We find a novel lossweighting scheme we call Inflection Weighting to be important when training recurrent models for navigation with behavior cloning and are able to out perform the baselines with this technique. We find that point clouds provide a richer signal than RGB images for learning obstacle avoidance, motivating the use (and continued study) of 3D deep learning models for embodied navigation.

show abstract

Section: Related Workmentioning

confidence: 99%

Embodied Question Answering in Photorealistic Environments With Point Cloud Perception

Wijmans

Datta

Maksymets

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

121

107

View full text Add to dashboard Cite

show abstract

“…Autonomous agents, controlled by neural network policies and trained with reinforcement learning algorithms, have been used in a wide range of robot navigation applications [1,2,3,23,28,32,47,50,51]. In many of these applications, the agent needs to perform tasks over long time horizons in unseen environments.…”

Section: Introductionmentioning

confidence: 99%

Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks

Fang

Toshev

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Self Cite

170

151

View full text Add to dashboard Cite

Many robotic applications require the agent to perform long-horizon tasks in partially observable environments. In such applications, decision making at any step can depend on observations received far in the past. Hence, being able to properly memorize and utilize the long-term history is crucial. In this work, we propose a novel memorybased policy, named Scene Memory Transformer (SMT). The proposed policy embeds and adds each observation to a memory and uses the attention mechanism to exploit spatio-temporal dependencies. This model is generic and can be efficiently trained with reinforcement learning over long episodes. On a range of visual navigation tasks, SMT demonstrates superior performance to existing reactive and memory-based policies by a margin.

show abstract

“…7 These properties should be replicated in neural networks if they are to serve as accurate models of natural intelligence. New neural network architectures are slowly starting to take steps in this direction (e.g., (Santoro et al, 2017;Zhu et al, 2017;Louizos et al, 2017)).…”

Section: Reasoningmentioning

confidence: 99%

Computational Foundations of Natural Intelligence

Gerven

2017

Preprint

View full text Add to dashboard Cite

New developments in AI and neuroscience are revitalizing the quest to understanding natural intelligence, offering insight about how to equip machines with human-like capabilities. This paper reviews some of the computational principles relevant for understanding natural intelligence and, ultimately, achieving strong AI. After reviewing basic principles, a variety of computational modeling approaches is discussed. Subsequently, I concentrate on the use of artificial neural networks as a framework for modeling cognitive processes. This paper ends by outlining some of the challenges that remain to fulfill the promise of machines that show human-like intelligence.

show abstract

Visual Semantic Planning Using Deep Successor Representations

Cited by 128 publications

References 36 publications

Embodied Question Answering in Photorealistic Environments With Point Cloud Perception

Embodied Question Answering in Photorealistic Environments With Point Cloud Perception

Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks

Computational Foundations of Natural Intelligence

Contact Info

Product

Resources

About